Assessment Design

How to Design AI-Resistant Coding Interview Questions

ClarityHire Team(Editorial)2026-06-037 min read

You cannot ban the AI. You can design around it.

By 2026, asking candidates to swear off AI assistants for an interview is theater. The bar candidates clear with ChatGPT in another tab is the bar competitors hire against. The question is not how to detect AI use — that is half the job, and we cover it in how to detect ChatGPT in coding interviews. The other half is designing problems that do not collapse when the candidate has an LLM available.

This is a guide to the second half: how to write coding interview problems that produce real signal even when the AI is on, and where the AI's help maps cleanly onto skills you actually want to hire for.

Why standard problems no longer work

A typical LeetCode problem has three properties that make it trivial for an LLM:

It is in the training set. Every classic two-pointer, sliding window, and dynamic programming problem has been answered, explained, and re-explained across the internet.
The input is well-specified. Two integer arrays, a target, return the indices. The model knows exactly what to produce.
The output is binary. Tests pass or fail. There is no judgment in the answer for the candidate to defend.

A problem with all three properties is, in 2026, a ChatGPT skill check, not a programming skill check. Each property you remove makes the problem harder to outsource.

Principle 1: anchor the problem in code the LLM has never seen

The cleanest way to take training-set advantage off the table is to make the candidate work inside a specific codebase. An LLM can write a graph traversal cold. It cannot reliably extend a 400-line repo where two of the modules use an internal convention it has never encountered.

What this looks like in practice:

Fix-the-bug. A 200–400 line repo with a non-obvious bug. The symptom is given, the cause is not. The candidate has to read, hypothesize, test, and patch. We outline the format in our Leetcode-free interviewing guide.
Extend-the-feature. A working app with a small, well-scoped feature request. The diff quality is the assessment. An LLM can generate a plausible diff; it cannot guarantee the diff fits the existing conventions and passes the existing tests on the first run.
Refactor under a constraint. "This file is 600 lines and hard to read. Split it into three files without changing behaviour. Tests must still pass." The right answer is a judgment call. There is no canonical solution to memorize.

These formats neutralize the training-set advantage because the LLM cannot have seen the repo. The candidate's ability to navigate the unfamiliar code becomes the signal.

Principle 2: make the input ambiguous on purpose

LLMs are at their strongest when the spec is precise. They are at their weakest when the spec is missing information that a human would notice and ask about.

A standard problem: "Given a CSV of orders, compute monthly revenue."

A harder-to-outsource version: "Here is a CSV of orders. Compute monthly revenue. Some rows are refunds and represented as negative amounts. Some rows are partial refunds and use a different status value. Some rows are from a discontinued product line that was not actually shipped. Decide which of these to include and explain your reasoning before you write any code."

The reasoning step is the assessment. An LLM will happily produce code that handles the canonical case and quietly drops the rest. A real engineer will pause, list the ambiguities, and ask a clarifying question — or document the assumptions they made.

Grade the candidate's clarifications and assumptions separately from the code that follows. The clarification list is harder to fake than the implementation.

Principle 3: shift weight onto the conversation around the code

A submission is a sample, not an answer. An LLM can produce a submission. Only the candidate can defend it under live questioning.

Two ways to lean into this:

Async-then-sync. Run the take-home or async coding round first, then schedule a 30-minute live round where the candidate walks you through their submission. Ask them to extend it, change a constraint, or argue why they picked one data structure over another. We unpacked this format in follow-up questions for take-home submissions.
Live pairing on the candidate's own code. Mid-interview, change one requirement and ask them to refactor in front of you. The candidate who pasted the original solution from an LLM will struggle to evolve it without re-prompting.

In ClarityHire we instrument the live round with code coherence checks on the original submission, so the interviewer walks in knowing which sections of the candidate's code look LLM-generated and which look hand-written. The conversation can then target the suspect sections specifically.

Principle 4: pick problems where AI help is a feature, not a bug

If your job posting says "you will use AI tools every day in this role," then design problems that let you observe how candidates actually use them. This is not the same as letting them paste the answer.

Open-book with attribution. State explicitly: "You may use ChatGPT or any documentation. Document every prompt you used and the part of the solution it produced." A strong candidate uses the model surgically; a weak one prompts five times for the same broken function. The trace is the signal.
AI as an opponent. Give the candidate a buggy AI-generated function as the starting point. Their job is to identify what is wrong, why, and fix it. This tests the skill the role actually requires: critically reviewing AI output before shipping it.
AI-assisted debugging. Hand them an unfamiliar repo with a real bug, allow tool use, and watch how they pin down the cause. Engineers who lean on the model for narrow questions ("what does this stack trace mean?") look very different from those who paste the whole file in and pray.

We explore the bigger picture of this format in open-book coding assessments.

Principle 5: combine prevention with detection — never rely on one

Even the best-designed problem leaks signal in only one direction. Pair the design choices above with passive integrity signals:

Keystroke biometrics flag burst-paste events that do not match human authorship.
Code coherence analysis catches stylistic shifts mid-submission.
A live follow-up catches authors who cannot defend their own code.

Each one is weak alone. Together they make the cost of cheating higher than the cost of doing the work.

A short checklist before you ship the question

Before you put a coding problem in front of a candidate in 2026, ask yourself:

Could ChatGPT solve this from the problem statement alone? If yes, redesign or reduce its weight in the loop.
Is the spec ambiguous in a way that rewards clarifying questions? If not, add ambiguity.
Does the candidate have to defend or evolve their solution live? If not, schedule a follow-up.
Are you measuring how they use AI, or pretending they will not? Be honest with the candidate either way.
Do you have an integrity signal that confirms the same person wrote the submission and ran the follow-up? If not, add one.

What to do next

If your current coding round is a single LeetCode-style problem with a take-home dropbox, you are testing prompt engineering at this point, not engineering. Pick one of the formats above, write one problem in it, and run it on three internal engineers before you put it in front of a candidate. The first version will be too easy or too unclear; the third will be the one that produces real signal.

For broader context on where async fits in the loop, see async vs live technical interviews. For the detection half of this same problem, start with how to detect ChatGPT in coding interviews.

ai-resistant interviewscoding assessment designchatgptinterview design