AI-Assisted Coding Interview Policy: How to Design and Grade One
The policy question every engineering team is being forced to answer
Meta opened the door first. Shopify, Canva, and Rippling followed within a quarter. By mid-2026, roughly one in four employers explicitly permits AI assistance during technical interviews, and the curve is steepening. The question on most engineering hiring committees has shifted from should we ban ChatGPT? to if 90% of our engineers code with AI daily, what does an honest interview look like?
This post is the policy and rubric we recommend to teams writing this from scratch. It is not a permission slip for vibe-coding. It is a way to run an interview where the candidate uses the same tools they will use on day one, and you still walk away with a defensible signal.
Start by deciding what you are actually measuring
The mistake almost every team makes when they first allow AI is leaving the rubric unchanged. They take their old "implement this LRU cache" prompt, hand the candidate a Claude tab, and act surprised when everyone passes.
If you allow AI, you are no longer grading whether the candidate can produce working code. You are grading four things that the AI cannot do for them:
- Problem framing. Can they translate a vague business prompt into a concrete spec, including edge cases the requester did not mention?
- Direction-setting. Can they decide what to build first, what to skip, and when to throw out a draft?
- Critical review. When the AI returns confidently wrong code, do they notice? Do they ask for an alternative or do they ship the bug?
- Justification. Can they defend every line they submitted? "The model wrote it" is not an answer.
Write that down before you write a single problem. The rubric drives everything else.
Designing a prompt the AI cannot one-shot
A good AI-permitted prompt has three properties:
- It is under-specified on purpose. The candidate has to ask clarifying questions or make assumptions explicit. A prompt that includes every constraint up front lets the candidate dump the spec into the model and ship.
- The "happy path" is half the work. The other half is handling a constraint that contradicts an earlier assumption. Real engineering is reconciling tensions, not implementing leetcode.
- It has at least one trap. A subtle requirement that, if missed, makes the solution look right but behave wrong on a non-obvious input. Models miss these reliably; thinking humans catch them.
Concrete example. Instead of "implement a rate limiter," ask: "Add a rate limit to this existing endpoint. The product team wants 'reasonable' protection against abuse. The endpoint is called from both a mobile app and an internal cron job. Here is the existing code." Now the candidate has to define "reasonable," notice that the cron job will trip a naive per-IP limiter, and explain their tradeoffs. The AI will happily write a per-IP limiter. The candidate has to decide it is wrong.
For more on building prompts that resist LLM-pasted solutions, see how to screen developers without LeetCode.
What the interview actually looks like
A live AI-assisted coding interview, in practice, runs 60 minutes:
- 10 minutes — framing. The interviewer presents the problem. The candidate reads it, asks clarifying questions, states their assumptions. They are explicitly not coding yet. This is the highest-signal portion of the interview and is mostly AI-immune.
- 35 minutes — building, with AI available. The candidate writes code, prompts the model when they want, and narrates what they are doing. The interviewer watches for how the AI is used: are they prompting with the right context, accepting output uncritically, or treating it like a junior pair?
- 15 minutes — defense and extension. The interviewer asks the candidate to walk through a function the AI generated, change a constraint, and explain what breaks. This is where weak candidates collapse.
This is the same structure as a strong live coding interview, with one explicit twist: the AI is a tool the candidate is being graded on using well, not a forbidden device.
Grading rubric: five dimensions, five points each
Use the same rubric across every candidate and every interviewer, with anchored behaviors at each level. A working starting point:
| Dimension | What it measures |
|---|---|
| Problem framing | Quality of clarifying questions and stated assumptions before any code is written. |
| Decision-making | Did they pick the right thing to build first? Did they stop and reconsider when they hit a dead end? |
| AI fluency | Are prompts specific and well-scoped? Do they reject bad outputs and ask for alternatives? |
| Critical review | When the AI is wrong, do they catch it? Do they test the trap case? |
| Justification | Can they explain every line, change it on request, and defend tradeoffs? |
Score each dimension 1–5 with behavioral anchors. Anchors are what separate this from vibes; if you have never calibrated a rubric, do that before you run the interview. Free-form scoring across multiple interviewers is how you end up with the loudest voice deciding hires.
Build the form once in your scorecard system and reuse it. If your scorecard cannot enforce that every dimension is rated before submission, get one that can.
The integrity question: trust but verify
Allowing AI does not mean trusting that the person on the call is the one being hired. Two specific risks remain:
- Impersonation. Someone else is on the keyboard or whispering in the candidate's ear off-camera.
- Coached answers. The candidate has a second device feeding them framing language and "clarifying questions" they did not actually generate.
Neither risk has anything to do with the AI policy. They exist in any remote interview. ClarityHire's integrity layer is built for exactly this world: we do not flag candidates for using AI when AI is allowed. We do verify that the same person typed throughout, that their face matches the identity on file, and that the work product reflects what we observed them produce. The keystroke biometrics and face continuity signals catch impersonation regardless of whether the candidate used Claude.
If you are running AI-permitted interviews, your integrity report should say nothing about AI use and everything about who did the work.
What to do next
If you are converting an existing interview loop to allow AI, work in this order:
- Rewrite one prompt to be under-specified, with a hidden constraint. Pilot it with two internal engineers before any candidate sees it.
- Replace the old "coding correctness" rubric with the five-dimension rubric above. Anchor every level with a behavioral example.
- Calibrate three interviewers on a recorded session before going live with candidates.
- Decide on your integrity policy separately from your AI policy. They are not the same question.
- After ten loops, pull the score distributions and check for drift. If everyone is getting 4s on AI fluency, your anchors are too generous.
The teams getting this right are not the ones with the strictest policies. They are the ones whose rubrics measure judgment instead of typing.