ML Engineer Assessment Template

A ready-to-run ML engineering hiring test covering PyTorch, model evaluation, deployment, and MLOps — with live notebook execution.

Duration
90 minutes
Questions
9
Level
Senior
Passing Score
70%

What this template measures

Every skill needed for a ml engineer hire, covered across MCQ, coding, and essay questions.

PyTorch Fluency

nn.Module, autograd, training loops, distributed basics.

Model Evaluation

Metrics, CV, stratification, bias detection.

Feature Engineering

Numerical and categorical encoding, leakage avoidance.

Deployment

FastAPI serving, inference optimization, batching.

MLOps

MLflow, W&B, model versioning, drift detection.

Systems Thinking

Feature stores, training-serving skew, monitoring.

Sample questions from this template

A preview of the questions you'll see when you use this template.

Multiple ChoiceMediumQuestion 1

You're training a binary classifier on imbalanced data (95%/5%). Which metric is LEAST informative?

  • A.Precision
  • B.Recall
  • C.F1
  • D.Accuracy
CodingHardPython (PyTorch)Question 2

Train a simple MLP classifier on MNIST (or similar). Include: - Train/val/test split with stratification - Training loop with early stopping - Evaluation on held-out test set - Confusion matrix + classification report - Save model weights to disk

CodingHardPythonQuestion 3

Wrap a trained model as a FastAPI inference service: - POST /predict accepts input features as JSON - Batching if multiple requests come in within 50ms - Returns { prediction, probability, model_version } - Includes /healthz and /metrics endpoints

EssayHardQuestion 4

Your deployed model's accuracy drops 10% over 3 months. Walk through how you'd investigate — what you'd check first, what tools, how you'd distinguish data drift from concept drift.

Scoring rubric

How candidates are evaluated on this template.

Dimension
Description
Weight
Training Correctness
Loop is correct, avoids leakage, evaluates on held-out.
30%
Deployment
Service is production-shaped with monitoring.
25%
Evaluation Rigor
Metrics match the problem, CV done correctly.
20%
MLOps
Versioning, monitoring, drift awareness.
15%
Communication
Explains tradeoffs clearly in writing.
10%

Frequently asked questions

Is GPU available in the sandbox?+

CPU-only by default to keep assessment times consistent. Small models train fine within 90 minutes. GPU variants available for enterprise accounts.

Can I customize this template?+

Yes. Every question, time limit, weighting, and rubric dimension is fully editable. Use the template as a starting point and tailor it to your role and seniority level.

Does this template include AI cheat detection?+

Yes. All ClarityHire assessment templates ship with code coherence AI, keystroke biometrics, and paste detection enabled by default. You can dial integrity level per role.

Can candidates see sample questions before starting?+

Yes. Each template supports unscored practice questions so candidates warm up before the real assessment begins. That way you measure skill, not test anxiety.

Launch Your ML Engineering Assessment Today

Customize this template and invite candidates in minutes.