Idempotency
The property that running a pipeline (or rebuilding a model) multiple times produces the same result as running it once — no duplicates, no drift. A core requirement for reliable data.
A pipeline is idempotent if re-running it doesn't change the outcome. Reload yesterday's data twice and you should get one copy, not two. This matters because in the real world jobs fail halfway, get retried, and backfill overlapping windows — and none of that should corrupt your tables.
The usual enemy of idempotency is naive `INSERT` logic that appends every run. The fixes: build full-refresh tables from a deterministic SELECT, or use merge/upsert on a unique key for incremental models so re-processing a window replaces rather than duplicates.
Designing for idempotency is what lets you sleep through a 3 a.m. retry. It's also a favorite interview question because it separates people who've run production pipelines from people who haven't.
Production pipelines fail and retry constantly. Idempotency is what makes those retries safe instead of a source of silent duplication and drift — it's the difference between robust data infrastructure and a fragile one.
It's a strong interview signal: candidates who design for idempotency have clearly operated real pipelines.
- Append-only INSERT logic that duplicates rows on every re-run or backfill.
- Non-deterministic transformations (e.g. relying on current_timestamp or random ordering) that produce different output each run.
- How do I make a data pipeline idempotent?
- Build from deterministic SELECTs and use merge/upsert on a unique key (rather than blind INSERT), so re-processing the same window replaces rows instead of duplicating them.
- dbtdbt Macros & Jinja Tips Every Analytics Engineer Should Know: Expert Guide
- dbtdbt Cloud vs Core: Feature Comparison 2025—Comprehensive Guide
- ArchitectureImplementing Data Products in a Data Mesh: Essential Strategies and Practices
- ArchitectureWhat Is a Lakehouse? Architecture & Use-Cases Explained in Depth
Learn this by building, not memorizing.
Definitions get you the vocabulary. The platform gets you the skill — graded exercises, real projects, and a portfolio capstone.
