Data Engineer Assessment Template

A ready-to-run data engineering hiring test covering SQL, schema design, pipeline design, and data quality — with live SQL execution.

Duration
75 minutes
Questions
10
Level
Mid-Level
Passing Score
70%

What this template measures

Every skill needed for a data engineer hire, covered across MCQ, coding, and essay questions.

SQL Fluency

Joins, window functions, CTEs, analytical queries.

Schema Design

Normalization, star schemas, SCD, partitioning.

Pipeline Design

Airflow, dbt, Dagster DAG design and scheduling.

Data Quality

Null handling, deduplication, freshness, lineage.

Streaming Basics

Kafka, Kinesis, exactly-once semantics.

Python Data Stack

pandas, PySpark basics.

Sample questions from this template

A preview of the questions you'll see when you use this template.

Multiple ChoiceMediumQuestion 1

In a star schema, fact tables typically contain:

  • A.Descriptive attributes about business entities
  • B.Foreign keys and measurable metrics
  • C.Slowly changing data with history
  • D.Only denormalized dimensions
CodingMediumSQLQuestion 2

Given `sessions(user_id, session_start, page)`, write a query that returns each user's first 3 pages in order of time for each session.

CodingHardSQLQuestion 3

Given `orders(id, user_id, amount, created_at)`, write a query that returns each user's: running 7-day total, running 28-day total, and rank by 28-day total across all users. Single query using window functions.

CodingMediumPythonQuestion 4

Write an Airflow DAG definition that: - Runs daily at 2am UTC - Extracts yesterday's data from Postgres - Loads it into S3 as parquet - Triggers a downstream dbt model on success - Retries 3 times with exponential backoff on failure

EssayHardQuestion 5

Your dbt pipeline produces incorrect results one day per week. Walk through how you'd investigate — what you'd check first, what tools you'd use, how you'd reproduce.

Scoring rubric

How candidates are evaluated on this template.

Dimension
Description
Weight
SQL Correctness
Queries return correct, efficient results.
40%
Schema Design
Models reflect real business needs, handle change.
20%
Pipeline Design
Idempotent, observable, failure-friendly DAGs.
20%
Data Quality Sense
Anticipates nulls, dupes, and freshness issues.
10%
Communication
Explains tradeoffs in writing clearly.
10%

Frequently asked questions

Which SQL flavor does this template use?+

PostgreSQL by default with DuckDB for analytical queries. Variants for BigQuery, Snowflake, and Redshift available.

Can I customize this template?+

Yes. Every question, time limit, weighting, and rubric dimension is fully editable. Use the template as a starting point and tailor it to your role and seniority level.

Does this template include AI cheat detection?+

Yes. All ClarityHire assessment templates ship with code coherence AI, keystroke biometrics, and paste detection enabled by default. You can dial integrity level per role.

Can candidates see sample questions before starting?+

Yes. Each template supports unscored practice questions so candidates warm up before the real assessment begins. That way you measure skill, not test anxiety.

Launch Your Data Engineering Assessment Today

Customize this template and invite candidates in minutes.