Surrogate key vs natural key — what's the difference?

A natural key comes from the source system and carries business meaning (e.g. an email or order number). A surrogate key is a warehouse-generated, meaningless, stable identifier. Use surrogate keys for joins; keep natural keys as attributes.

GlossaryData modeling

Surrogate key

A warehouse-generated unique identifier for a dimension row — independent of any source system ID. It keeps joins stable when natural keys are messy, reused, or change.

A surrogate key is an artificial primary key you generate inside the warehouse, typically by hashing the natural key (and, for slowly changing dimensions, the valid-from date). It has no business meaning — its only job is to uniquely and stably identify a dimension row.

You use them because natural keys from source systems are unreliable: they get reused, they change format, they collide across systems, and for Type 2 dimensions a single natural key maps to multiple historical rows. A surrogate key sidesteps all of that and gives the fact table one clean column to join on.

In dbt the common approach is `dbt_utils.generate_surrogate_key([...])`, which produces a deterministic hash from the columns you specify.

Why it matters

Surrogate keys are what make Type 2 history and multi-source integration possible — they're the stable hinge every fact-to-dimension join swings on.

They also protect you from a whole class of production incidents where a source system 'helpfully' reuses or reformats an ID and quietly corrupts your joins.

Common mistakes

Joining facts to dimensions on natural keys, which breaks the moment a source reuses or changes them.
Generating non-deterministic surrogate keys (e.g. random UUIDs) so rebuilds produce different keys and break downstream references.

FAQ

Surrogate key vs natural key — what's the difference?: A natural key comes from the source system and carries business meaning (e.g. an email or order number). A surrogate key is a warehouse-generated, meaningless, stable identifier. Use surrogate keys for joins; keep natural keys as attributes.

Go deeper

Related terms

Go deeper in the Data Modeling hub.

Learn this by building, not memorizing.

Definitions get you the vocabulary. The platform gets you the skill — graded exercises, real projects, and a portfolio capstone.

Try free practice See the course →