Free guide

The Analytics Engineering Interview Kit

The questions that actually come up in analytics engineering screens — with the answers hiring managers are listening for.

Get the full kit

$19

Read the whole guide free below. Want it to keep? The printable PDF kit adds a ton more: a 75-question interview bank with worked answers, a SQL pattern cheat-sheet, strong-vs-weak answer breakdowns, a portfolio walkthrough rubric, and a printable 2-week study plan — instant download.

Most interview prep fails because it's generic. “Practice SQL” isn't a plan. This kit is specific: the actual shape of an analytics engineering loop, the exact question categories, and what separates a pass from a fail in each one. Written by Eric Provencio — analytics engineer at Disney, Hulu, Nike, Peloton, and Gopuff, and a 5.0-rated mentor.

You don't need to memorize answers. You need to recognize the patterns, drill them until they're automatic, and have one project you can talk through cold. That's the whole game.

The shape of the loop

Almost every analytics engineering interview has four parts. Know which one you're in:

Live SQL screen — a shared editor, 30–45 minutes, real questions against a small schema. This is the gate. Fail it and nothing else matters.
Data modeling round — usually conceptual or a whiteboard: design a model for X, explain a star schema, handle a slowly changing dimension.
Portfolio walkthrough — you drive, 10–15 minutes, explaining a project while they poke at your decisions.
Behavioral — “tell me about a time,” stakeholder conflict, a mistake you made.

The single biggest mistake candidates make: over-indexing on LeetCode-style algorithms. Analytics engineering screens almost never ask them. The SQL screen IS the algorithm round. Spend your time there.

Part 1 — The SQL screen

These are the patterns that show up over and over. If you can do all five live, you'll clear most screens.

1. Aggregation with a twist (GROUP BY + HAVING)

From an orders table, find customers with more than 3 orders and total spend over $500.

select customer_id, count(*) as orders, sum(amount) as total_spend
from orders
group by customer_id
having count(*) > 3 and sum(amount) > 500;

What they're checking: do you know HAVING filters after aggregation (not WHERE).

2. The “latest record per group” (window functions)

Return the most recent order per customer.

select * from (
  select *,
    row_number() over (partition by customer_id order by order_date desc) as rn
  from orders
) where rn = 1;

This one comes up constantly. If you can write it without hesitating, you're ahead of most candidates.

3. Period-over-period (LAG)

Show each month's revenue and the % change from the prior month.

with monthly as (
  select date_trunc('month', order_date) as month, sum(amount) as revenue
  from orders group by 1
)
select month, revenue,
  round(100.0 * (revenue - lag(revenue) over (order by month))
        / lag(revenue) over (order by month), 1) as pct_change
from monthly;

4. The join that doesn't explode

Count orders per customer, including customers with zero orders. The trap is an inner join (drops the zero-order customers) or a fan-out that double counts. The answer is a LEFT JOIN from customers, COUNT(orders.id) (not COUNT(*)), grouped by customer.

5. Deduplication

Given duplicate rows, return distinct records keeping the latest. Same ROW_NUMBER() pattern as #2 — partition by the natural key, order by updated-at, keep rn = 1.

How to actually pass the screen: narrate as you type. Say “I'll start with a CTE for the monthly rollup, then…” Interviewers score your reasoning, not just the final query. Silence reads as not knowing.

Part 2 — Data modeling

These separate junior from senior. Have crisp, one-paragraph answers ready.

Fact vs dimension table: Facts store measurable events (one row per order, click, payment) with numeric measures and foreign keys. Dimensions store the descriptive context you slice by. Facts are tall and narrow; dimensions are wide and short.
Grain: what one row represents. It's the first thing you declare when modeling a fact table — “one row per order line per day” — because mixing grains silently double-counts every metric.
Changing attributes over time: Slowly Changing Dimension. Type 1 overwrites (no history); Type 2 keeps history by closing the old row and inserting a new one with valid-from/valid-to dates. Type 2 is the one they want you to know.
“Walk me through modeling X”: use Kimball's four steps out loud — pick the business process, declare the grain, choose the dimensions, choose the facts.

Part 3 — The portfolio walkthrough

This is where offers are won or lost, and it's the part people prep least.

Lead with the problem, not the tech. “The business couldn't trust revenue because three teams defined it differently” beats “I used dbt and BigQuery.”
Show the repo and the lineage. A real GitHub repo with staging → marts layers and a dbt DAG signals you've done the actual job.
Have a decision to defend. Explain one modeling trade-off: “I went Type 2 here because finance needed point-in-time reporting.”
Rehearse out loud ten times. The walkthrough fails on delivery, not content.

Part 4 — Behavioral

Use STAR (Situation, Task, Action, Result) and keep answers under two minutes. Prepare three stories you can adapt:

A time you found and fixed a data quality issue.
A time you disagreed with a stakeholder about a metric definition.
A mistake you made in production and what you changed afterward.

The mistake story builds the most trust: “I shipped a model that double-counted because I mixed grains; now I add a uniqueness test on the grain of every fact” is a great answer.

The disqualifiers

Going silent during the SQL screen.
Not being able to explain your own portfolio project's decisions.
Confusing WHERE and HAVING, or COUNT(*) vs COUNT(column) with NULLs.
Talking tools instead of business impact.
No questions for the interviewer (always have two).

Your 7-day drill plan

Days 1–2: The five SQL patterns above, timed, until automatic.
Day 3: Window functions deep — ROW_NUMBER, RANK, LAG/LEAD, running totals.
Day 4: Modeling answers — say each one out loud in one paragraph.
Day 5: Portfolio walkthrough rehearsal (×3, out loud).
Day 6: Behavioral stories in STAR format.
Day 7: Full mock — one SQL problem, one modeling question, one walkthrough.

Keep going — the reps are the work.

This kit is the map. The platform has the graded SQL exercises mapped to interview difficulty, a data-modeling track, and a BigQuery + dbt capstone you can walk through in the portfolio round.

Start free training See the course →

Get the updates

Want more like this?

Occasional, practical analytics-engineering breakdowns and interview tips. No spam.

No spam. Unsubscribe anytime.