Integrity

How to Detect AI-Generated Essay Answers in Candidate Assessments

ClarityHire Team(Editorial)3 min read

The bad news about generic AI detectors

Off-the-shelf "AI content detectors" — Turnitin, GPTZero, ZeroGPT — have a measured false-positive rate of 4–15 % on native English writing, and worse on non-native writers. They are not accurate enough to reject a candidate on.

If you are using a generic AI detector to fail candidates, stop. You are filtering for "doesn't write like a textbook" more than you are filtering for "used AI". You are biased against ESL writers and against junior writers who polish their drafts.

The signals that actually matter

The useful detection signals are behavioural, not lexical:

  1. Time to first keystroke. A candidate who reads the prompt for 15 seconds and then types 800 polished words in 90 seconds did not write those words.
  2. Paste events. A clean paste of the full answer is not a candidate writing live. The integrity layer records every paste event with a length and a timestamp.
  3. Edit distance during composition. Live writing produces a messy keystroke timeline — insertions, deletions, cursor jumps. Paste-and-polish produces a flat, append-only timeline.
  4. Tab and focus switches. Did the candidate leave the page mid-question for 40 seconds? They probably went to ask an LLM.
  5. Voice/text mismatch. If you have a recorded interview with the same candidate, compare their spoken vocabulary to their written essay. AI essays use a much wider, more polished register than the candidate's own speech.

The signals to ignore

  • "Sounds like AI" lexical features. "Delve", "tapestry", "leveraging" — these are now contaminated. Real candidates use them. Real models use them. You can't separate the two from word choice alone.
  • Perfect grammar. A lot of candidates write through Grammarly. That's not cheating; that's how the modern web writes.
  • Generic structure ("In conclusion…"). Generic structure is what most candidates have been taught to write since high school.

The pattern that works in practice

  1. Run essays with paste-detection on. Reject silent pastes, or warn the candidate they triggered one — your call. What you must not do is record nothing and then guess.
  2. Surface behavioural signals in the report, not a single "AI confidence score". A reviewer who sees "30 seconds reading + one 740-character paste at 0:31" can decide; a reviewer who sees "67 % AI" can't.
  3. Pair the essay with a 5-minute live follow-up. Ask the candidate to explain a specific paragraph they wrote. Paste-and-polish candidates fall apart fast on a follow-up. Real authors don't.

A note on policy

Tell candidates in advance whether AI tools are allowed. Most companies should say "no for this stage". A small minority should say "yes, and tell us how you used it" — a valid stance for senior roles where AI is part of the actual job. What you must not do is silently penalise AI use that you never prohibited; that is both unfair and indefensible if a candidate appeals.

How ClarityHire surfaces this

Our integrity layer captures paste events, keystroke timelines, and tab-focus signals for every assessment. The reviewer sees the timeline, not a single black-box score. AI-content detection runs as one input among several — never as an auto-reject. Combine with a live follow-up round and you have a defensible, candidate-fair process.

TL;DR

Don't fail candidates on lexical "AI detectors" — false-positive rates are too high. Instead, capture behaviour (paste events, time-on-task, tab focus), surface the timeline to a human reviewer, and gate with a short live follow-up. That combination is what a competent hiring loop looks like in 2026.

ai generated essay detectionchatgpt in hiringessay assessmentintegrity signalscandidate fairness

Related Articles