How to Set a Passing Score for a Coding Assessment
The wrong way: "let's go with 70%"
Most teams pick a passing score by polling the room. "What feels right? 70? Let's say 75." The number gets written down, the assessment goes live, and it is revisited only when the funnel hurts — either too few candidates clear the bar and the role stays open, or too many do and the live interview round is buried.
A cut score is not a vibe. It is the most consequential single decision in your hiring funnel: every point you move it shifts who reaches the next stage. It deserves a method.
What a cut score actually represents
Formally, a cut score is the threshold separating a minimally qualified candidate (MQC) — the lowest skill level you would still hire — from someone you would not advance. Everything else flows from that one definition.
Two common confusions:
- A cut score is not the expected score of a typical hire. It is the floor. Most of your hires should clear it comfortably.
- A cut score is not a measure of test difficulty. Two assessments with very different difficulty can have the same MQC, and therefore very different cut scores.
Step 1: Write down the MQC
Before you touch the number, write a one-paragraph description of the minimally qualified candidate for this role on this assessment. Be specific:
"A backend engineer with 2–3 years of production experience who can implement a CRUD endpoint with input validation, write a test for the happy path, and handle one edge case without prompting. They may miss one of the three secondary edge cases. They produce readable code but not always elegant code."
If you cannot write this paragraph, you are not ready to set a cut score. Run a calibration session with the hiring panel until you can.
Step 2: Pick a method
Three methods practitioners use, in increasing order of rigor:
Angoff-style estimation
A panel of 3–5 engineers familiar with the role takes the assessment themselves, then reviews each item and estimates the probability that an MQC would get it right. Sum those probabilities and you have a draft cut score. Crude but defensible — and far better than nothing.
Borderline-group method
Identify 10–20 past candidates whom the team agreed were "on the bubble" — could have gone either way. Look at their actual scores. The median of that group is your draft cut. This requires historical data but uses real evidence rather than reviewer guesses.
Contrasting-groups method
Pull two sets of past candidates: the ones who got an offer and performed well in their first six months, and the ones who got an offer and underperformed (or were no-hired at the next round). Find the score that best separates the two distributions. The strongest method when you have the data, the weakest when you do not.
Most teams should start with Angoff for the first version of a new assessment and migrate to borderline-group or contrasting-groups as data accumulates.
Step 3: Calibrate against your actual pool
Once you have a draft cut, run it against your last 50–100 completed assessments and compute three numbers:
- Pass rate. What fraction of all submissions clear the bar?
- Top-quartile pass rate. Of candidates you would obviously interview, how many clear?
- Bottom-quartile pass rate. Of candidates you would obviously reject, how many clear?
Two failure modes to look for:
- Top-quartile pass rate below 90% — your cut is too high; you are rejecting people you want.
- Bottom-quartile pass rate above 10% — your cut is too low; you are letting through people you do not want to interview.
If both numbers look right, ship the cut. If either is off, the right move is usually not to slide the cut up or down by 5% — it is to look at which questions are misclassifying. Often one or two badly calibrated questions are responsible, and fixing them is better than retuning the threshold.
Step 4: Adjust for the source, not the role
A senior assessment pulled from a curated referral pool has a fundamentally different score distribution than the same assessment with an open-call pool. Same role, same questions, different inputs.
The cleanest way to handle this:
- Keep the cut score tied to the assessment, not the channel.
- Track pass rate by source separately in your hiring analytics.
- Use source-level pass rates to decide where to invest sourcing effort, not to bend the bar for one channel.
Teams that quietly lower the cut for "good" sources end up with two unrelated problems: their bar drifts in ways nobody can articulate, and their best sources stop looking as good once the comparison is no longer apples-to-apples.
Step 5: Watch the false-negative side
Most cut-score conversations focus on false positives — candidates who passed but shouldn't have. False negatives — strong candidates rejected by a too-high bar — are invisible by definition. You never see what they would have done in the interview.
Three proxies for false-negative risk:
- Borderline reviewer disagreement. If two reviewers grade the same submission and disagree by more than one point on the rubric, small cut differences may be flipping the outcome on noise.
- Time-on-task outliers. A candidate who scored just under the cut in 25% of the allotted time is a different signal than one who scored just under it in 100% of the time. The first is often a near-miss, the second often a true reject.
- Re-test agreement. For a sample of borderline rejects, offer a short follow-up exercise. If most pass, the cut is too high or the assessment too noisy.
Step 6: Re-score every quarter
A cut score that was right in January is unlikely to be right in June. Three drivers of drift:
- The assessment ages. Questions leak into LeetCode-style repos and AI training data. Scores creep up not because candidates got better, but because the test got easier.
- The candidate pool shifts. Layoffs, market changes, and seasonal patterns all move the distribution.
- AI assistance normalizes. As candidates get better at using LLMs, what used to be a 70th-percentile score becomes a 40th-percentile one.
Re-run the calibration each quarter on the most recent 50–100 submissions and adjust. Document the change and the reason — future-you needs the audit trail.
What ClarityHire surfaces
The hiring analytics dashboard renders the score distribution for every assessment alongside per-question pass rates and time-per-question, which is what you need for both the initial calibration and the quarterly re-score. Pass rates can be split by source, role, and time window so you can spot drift before it shows up in the funnel. For teams running AI-graded coding rubrics, the per-rubric-dimension breakdown tells you whether a borderline candidate is borderline on correctness, code quality, or testing — three different decisions, often conflated under a single score.
What to do next
Pick the single assessment you use most often and run the four-step audit this week:
- Write the MQC paragraph for that assessment. If the team disagrees on the wording, pause and reconcile before doing anything else.
- Calculate top-quartile and bottom-quartile pass rates against your current cut.
- Identify the two or three questions doing most of the misclassification work.
- Pick one method — Angoff if you are starting from zero, contrasting-groups if you have a year of outcome data — and produce a defensible cut, in writing, with the calibration evidence attached.
The cut score is the place where measurement turns into decision. Treat it like any other piece of important infrastructure: documented, tested, and revisited on a schedule.