Hiring Strategy

The Predictive Validity of Hiring Methods: What the Research Actually Says

ClarityHire Team(Editorial)5 min read

Why "predictive validity" is the only number that matters

Predictive validity is the correlation, expressed as a coefficient r between 0 and 1, between an assessment score and later on-the-job performance. An r of 0.0 is a coin flip. An r of 0.5 is genuinely useful. An r of 0.7 is approaching the ceiling of what is measurable in noisy real-world settings.

Almost every hiring debate — "should we add a take-home?", "are personality tests worth it?", "is the resume screen broken?" — collapses into one question once you have validity numbers: does this method actually predict who will be a strong performer?

This post is the short, authoritative version of that literature.

The seminal source

For decades the reference was Schmidt and Hunter's 1998 meta-analysis, "The Validity and Utility of Selection Methods in Personnel Psychology," which synthesized roughly eighty-five years of accumulated research. It ranked predictors and their incremental contribution over general mental ability tests.

In 2022, Sackett, Zhang, Berry, and Lievens published a major correction ("Revisiting Meta-Analytic Estimates of Validity in Personnel Selection") arguing that earlier studies had over-corrected for range restriction and inflated the top of the chart. Their re-estimates pull most methods downward but leave the ordering largely intact.

The numbers below are the corrected estimates, rounded for memorability. Read them as relative rankings, not gospel.

The chart everyone should know

MethodApprox. validity (r)
Work sample tests0.33
Structured interviews0.42
Cognitive ability tests0.31
Integrity tests0.31
Job knowledge tests0.40
Conscientiousness (personality)0.19
Unstructured interviews0.19
Years of education0.10
Years of experience0.09
Reference checks0.13
Graphology / handwriting0.02

A few things jump out:

  • Structured interviews and work samples are at the top. Together, they are the load-bearing pillars of any defensible hiring loop.
  • Unstructured interviews — the kind most teams default to — barely beat education and experience. "We met them and they seemed great" is almost a coin flip.
  • Reference checks are weak. They are useful for disqualifying disasters, not for picking winners.

What the rankings mean operationally

Three implications most teams under-internalize:

1. Structure beats length

A 30-minute structured interview with anchored rating scales outperforms a 90-minute "tell me about yourself" conversation. The structure — same questions, same rubric, same dimensions — does more work than the time investment.

2. Work samples are the highest-leverage single addition

Most loops include some flavor of interview. Far fewer include a well-designed work sample. Adding one is usually the single biggest validity jump available, and it has the additional benefit of being more legally defensible because it directly samples the job.

3. Cognitive tests work but have adverse impact

Cognitive ability tests predict performance reasonably well across roles. They also tend to produce larger demographic score gaps than work samples do, which is why most modern hiring guidance prefers job-content assessments where possible.

"Authoritative source" — where to read further

For practitioners who want primary sources, the three to know:

  • Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274.
  • Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2022). Revisiting meta-analytic estimates of validity in personnel selection: Addressing systematic overcorrection for restriction of range. Journal of Applied Psychology, 107(11), 2040–2068.
  • The SIOP Principles (Society for Industrial and Organizational Psychology) — practitioner-facing guidance on validation and fairness.

The Sackett 2022 paper is the most current authoritative source. If you cite one number in a hiring-design doc, cite from there.

What this means for your loop

The shortest defensible recommendation:

  1. Add a work sample for any role where you can design a representative task in under three hours.
  2. Make every interview a structured interview — same questions, same rubric, scores collected before recommendations.
  3. Treat unstructured "vibes check" rounds as social, not predictive. Keep them short and weighted low.
  4. Drop reference checks from go/no-go decisions. Use them as a final sanity layer.

That single set of changes moves a typical loop from a predicted r in the 0.2s into the 0.5s. The compounding effect over a year of hires is enormous.

How ClarityHire fits

ClarityHire is built around exactly this priority order. The default scorecard template is structured. The assessment templates are work-sample-shaped, not trivia. Integrity signals keep take-home work samples honest in an AI-assisted world. The product opinions are downstream of the research — not the other way around.

Pick methods by validity. Everything else is style.

predictive validitystructured interviewswork samplehiring research

Related Articles