Code Review Interview: Questions, Rubric, and Why It Beats Algorithms
What a code review interview actually is
A code review interview is exactly what it sounds like: the candidate is given a working pull request — 100 to 400 lines, a single feature or bug fix — and asked to review it as if a teammate had opened it. They flag issues, suggest changes, and walk the interviewer through their reasoning. Sometimes there are deliberate bugs. Always there are judgment calls.
It is the most underused format in technical hiring. Most loops still default to a whiteboard algorithm round or a take-home build. Both are fine signals for specific things. Neither tests the skill engineers spend the most time on after they are hired.
Why it beats the algorithm round
Three reasons, in order of impact.
- It tests the actual job. Engineers spend more hours reviewing code than writing it from scratch. A loop that doesn't measure reviewing ability is measuring the wrong dimension and calling it engineering.
- It is hard to fake with AI. LLMs can write the diff. They struggle to argue about why a function should be split when a senior reviewer has a defensible case for the other side. The signal is in the back-and-forth, which is where LLM-assisted candidates collapse.
- It scales across seniority. Juniors notice syntax and bugs. Mids notice structure. Seniors notice the design choices the diff is hiding (what was deleted, what was deliberately not changed). Same artifact, very different reads — which means one PR can drive interviews from L3 to L6 with adjusted rubric weights instead of separate question banks.
This is the format teams reach for when they have realized LeetCode is not predictive and a pure work-sample take-home is too expensive to run at top-of-funnel volume.
How to pick the codebase
The PR you use determines whether the interview measures judgment or trivia. Three things matter:
- Use real, anonymized code. A toy snippet with three obvious bugs trains candidates to look for tricks. A real diff with mixed-quality changes forces them to actually read.
- Pick a domain the candidate will not have memorized. Avoid a TODO app or a leetcode-style algorithm. Use something with business logic — a pricing rule, a permission check, a retry policy. Domain familiarity is not the signal.
- Make sure there is at least one defensible judgment call. The best code review interview has zero "wrong" answers and three places where a thoughtful reviewer could push back either way. That is where you measure seniority.
Stock the round with the diff loaded into a real editor — not a static PDF. ClarityHire's collaborative coding rooms open the file with annotations and a comment thread the interviewer can watch in real time, so the review feels like a normal GitHub PR rather than an interrogation.
The four-dimension scoring rubric
Borrow the structure from a system design rubric but rebuild the dimensions for review work.
1. Issue identification
Did they catch the things a real review would catch? Score on density and severity:
- 4 — caught the load-bearing issue (security, correctness, or architecture) plus the small stuff
- 3 — caught the load-bearing issue or most of the small stuff
- 2 — caught surface-level issues but missed the structural problem
- 1 — missed the load-bearing issue and nitpicked formatting
2. Prioritization
Two issues are never the same weight. Score whether the candidate distinguishes "this is a bug" from "this could be cleaner" from "I would not block on this." Senior engineers anchor reviewers around the bugs; juniors treat every comment as equal-priority.
3. Justification quality
For each comment they leave, can they articulate the trade-off? "I'd extract this helper" is a B-tier comment. "I'd extract this helper because the same logic appears in the order-cancellation flow and we want them to evolve together" is the senior version. Score for specificity, not vocabulary.
4. Tone and collaboration
Code review is a relationship skill. A reviewer who is technically right but writes comments your team will resent is a net negative. Score for: does the comment land as a suggestion the author can act on, or as a takedown?
This is the dimension most interview formats cannot measure at all. Code review interviews surface it cleanly.
Sample questions to anchor the conversation
These are the prompts interviewers should hold back until after the candidate has done their own pass:
- "Walk me through the comments you'd actually post, in the order you'd post them."
- "What single comment would you make blocking?"
- "If the author pushes back on your biggest comment, what's your second-best argument?"
- "What's not in this diff that should be? What would you ask in the PR description?"
- "What would you not bring up, even though you noticed it?"
The last question is the most senior-discriminating one in the set. Knowing what to ignore is half of effective code review.
How to grade an open-ended review
The interview produces messy artifacts — inline comments, a transcript of the verbal walkthrough, sometimes a written summary. Three rules for grading them consistently:
- Use a shared scoring template per role. Build it once from your rubric library and reuse across candidates.
- Score the artifact and the conversation separately. A candidate who wrote great comments but argued poorly in the live discussion is a different hire than one who flipped that.
- Run a written-only round at scale, live in-person for finalists. AI-assisted grading of the written comments gives a useful first pass on density and specificity at top-of-funnel, where you cannot afford live time. The live round picks up the conversation signal that grading cannot.
Resisting AI assistance during the round
If the candidate runs the diff through Claude or GPT in another tab, the comments they paste will be technically dense but stylistically inconsistent with their own argumentation in the discussion. Two ways to surface that:
- Watch the edit pattern — clean, large pastes of structured prose followed by silence are the tell, the same way they are in a coding round
- Make the live discussion mandatory and adversarial: pick the candidate's strongest comment and argue against it. AI-authored comments collapse when the candidate cannot defend the specific phrasing they chose
You do not need to ban AI to make the round work. You need to design it so an AI-only candidate cannot pass the conversation half.
What to do next
Pick one PR from your codebase, anonymize it, and pilot the round with two current engineers next week. Have them review the diff cold and score themselves against the rubric. Whatever discrepancy shows up between their score and your gut sense of how they actually review at work is the discrepancy your rubric is fixing. Then run it on your next loop and compare candidate scores against your existing coding assessment results. In our customer base, the rank ordering shifts — and the new order maps better to who actually performs after hire.