Interview Design

How to Interview Engineers for AI-Assisted Coding Skill

ClarityHire Team(Editorial)2026-06-117 min read

The skill nobody is interviewing for

By mid-2026, most engineers use an AI assistant for some part of their daily work. The question your interview should answer is not "can this candidate code without one" — that is a hypothetical. It is "can this candidate ship faster and safer with one than the engineer sitting next to them."

That is a different skill, and it requires a different interview. A take-home with a hidden no-AI rule does not measure it. A LeetCode screen does not measure it. Even most open-book formats let a candidate slide through by treating the AI as a souped-up search engine. Below is a tactical guide to designing a round that measures AI fluency directly, and scoring it without rewarding theater.

What AI fluency actually looks like

Watch a strong engineer use an assistant for an hour. You will see them do five things:

Frame the prompt around constraints, not solutions. They tell the model what the code has to do and not do, then iterate.
Reject confidently. When the model proposes something subtly wrong — a wrong API version, a hallucinated library, a fragile pattern — they catch it within seconds and steer.
Use the assistant most on the boring parts. Boilerplate, test scaffolding, format conversions — high-leverage uses. They write the load-bearing logic themselves or hold the model to a tight specification.
Verify before they trust. They run the code, read the diff, and check the edges. They do not commit on vibes.
Know when to stop using it. When the model is going in circles, they switch back to thinking and reading.

A weak AI user does the opposite: asks broad questions, accepts long generations uncritically, lets the model architect their solution, and ships whatever runs. The interview's job is to distinguish these two.

The question format that surfaces it

The shape of the question matters more than the specific problem. Pick one with these properties:

The naive AI answer is wrong in a subtle way. Not a trick — a real constraint the candidate has to notice. A common version: ask them to implement something against a recent or unusual API, where the model's training data is stale.
The problem is too large for a no-AI candidate to finish, and too subtle for a full-AI candidate to one-shot. This forces both tools and judgment.
The acceptance criteria are clear. Not "build something nice" — explicit must-haves the candidate can verify against.
There is a deliberate ambiguity in the spec. A reasonable engineer would either resolve it by asking or by stating an assumption. A weak AI user pastes the spec into the model and ships whatever comes back.

A working example: "Build a small command that reads a CSV of orders, computes per-customer monthly totals, and outputs JSON. Two ambiguities you should resolve: order timezones, and how to handle refund rows. 60 minutes. Any tool you want."

Run it live, not async

A purely async take-home cannot distinguish the two profiles. The artifact looks the same either way. Run this as a 60–75 minute live coding session with screen share or a collaborative editor, with explicit permission to use AI.

Ground rules announced up front:

Any AI tool is fair game. Tell us which ones you use.
We will not deduct for using AI. We will probe how you use it.
We will probe what the AI gave you. Be ready to explain anything you keep.

This framing shifts the candidate from defensive to demonstrative. Strong candidates lean in. Weak ones get nervous, because their workflow does not survive scrutiny.

What to watch for in real time

Five concrete signals during the session:

Prompt quality. Do they explain constraints, examples, and the shape of the desired output? Or do they paste the spec and hope?
Reject-rate. How often do they discard a generation? Strong engineers reject at least a few — sometimes silently, sometimes out loud. Engineers who keep everything are not reading what they ship.
Where they leave AI alone. Do they write the critical logic themselves? Or do they let the assistant own the part that determines correctness?
Verification habits. Do they run the code on real inputs? Do they read the diff before accepting? Do they look at edges?
Recovery. When the model goes in circles, do they step back and think, or do they keep prompting?

These are observable in a 60-minute window if you watch the keystrokes, not just the artifact. ClarityHire's collaborative editor records the full session with paste events tagged separately from typed input — useful when you want to revisit a specific choice with the candidate during the walk-through.

A scoring rubric

Five dimensions, 1–4 each, scored before the debrief:

Dimension	Weak (1)	Strong (4)
Prompting	Pastes the spec, asks broad questions	Frames constraints, gives examples, iterates
Critical reading	Accepts long generations unchecked	Rejects, edits, and rewrites within seconds
Judgment of leverage	Uses AI on the load-bearing logic	Uses AI on boilerplate, owns the critical path
Verification	Ships untested or barely tested code	Runs against real inputs, reads diffs, checks edges
Recovery	Loops with the model when stuck	Switches to reading code or asking a clarifier

Score each independently. ClarityHire's structured interview scorecards lock these so reviewers cannot drift after seeing the artifact.

The post-session walk-through

Spend the last 15 minutes on three questions:

"Show me a generation you rejected. Why?"
"Walk me through the part of this you wrote yourself, and why you didn't let the AI do it."
"What's the most fragile part of what we shipped? What would you fix next?"

Candidates who actually reasoned during the session can answer these without a beat. Candidates who accepted whatever the model produced cannot — and the gap shows up in the first ten seconds of each answer. This is the same authorship test you would run on a take-home, applied to live AI use.

Where this fits in the loop

Treat this round as a replacement for the standard live coding screen, not an addition. Your loop should still include a system design round and a behavioral round; the question this round answers is the one previously handled by the closed-book coding screen, which has lost most of its signal in the LLM era.

What to do next

Three concrete moves before your next AI-allowed round:

Pick one role and rewrite the live coding question to one that AI cannot one-shot but that a no-AI candidate cannot finish.
Train your interviewers to score the workflow, not the artifact. The artifact is now table stakes. The workflow is the signal.
Decide what "verified" means before the candidate arrives. Locking in the rubric in advance prevents the post-hoc rationalization that wrecks calibration.

The teams that get this right in 2026 will out-hire the teams that pretend the model is not in the room. The skill is real, the gap between strong and weak users is enormous, and the interview format to measure it is no harder to run than the one it replaces.

ai codinginterview designdeveloper skillslive coding