Surrogate key
A warehouse-generated unique identifier for a dimension row — independent of any source system ID. It keeps joins stable when natural keys are messy, reused, or change.
A surrogate key is an artificial primary key you generate inside the warehouse, typically by hashing the natural key (and, for slowly changing dimensions, the valid-from date). It has no business meaning — its only job is to uniquely and stably identify a dimension row.
You use them because natural keys from source systems are unreliable: they get reused, they change format, they collide across systems, and for Type 2 dimensions a single natural key maps to multiple historical rows. A surrogate key sidesteps all of that and gives the fact table one clean column to join on.
In dbt the common approach is `dbt_utils.generate_surrogate_key([...])`, which produces a deterministic hash from the columns you specify.
Surrogate keys are what make Type 2 history and multi-source integration possible — they're the stable hinge every fact-to-dimension join swings on.
They also protect you from a whole class of production incidents where a source system 'helpfully' reuses or reformats an ID and quietly corrupts your joins.
- Joining facts to dimensions on natural keys, which breaks the moment a source reuses or changes them.
- Generating non-deterministic surrogate keys (e.g. random UUIDs) so rebuilds produce different keys and break downstream references.
- Surrogate key vs natural key — what's the difference?
- A natural key comes from the source system and carries business meaning (e.g. an email or order number). A surrogate key is a warehouse-generated, meaningless, stable identifier. Use surrogate keys for joins; keep natural keys as attributes.
- ArchitectureImplementing Data Products in a Data Mesh: Essential Strategies and Practices
- ArchitectureWhat Is a Lakehouse? Architecture & Use-Cases Explained in Depth
- FundamentalsChange Data Capture Patterns for Analytics Pipelines: The Complete Guide
- FundamentalsSurrogate vs Natural Keys: Choosing the Right Primary Key for Databases
Go deeper in the Data Modeling hub.
Learn this by building, not memorizing.
Definitions get you the vocabulary. The platform gets you the skill — graded exercises, real projects, and a portfolio capstone.
