The analytics engineering glossary.
Plain-English definitions of the terms that actually come up on the job — no textbook filler. Written by a working analytics engineer.
Data mart
A curated, business-facing set of tables built for a specific team or subject area — finance, marketing, product — that analysts and BI tools query directly.
Dimension table
A table of descriptive attributes — customers, products, dates — that you join to a fact table to slice and label its measures. One row per entity, identified by a surrogate key.
Dimensional modeling
The discipline of structuring warehouse data into facts and dimensions (usually star schemas) optimized for analytical querying and human comprehension, popularized by Ralph Kimball.
Fact table
The central table in a star schema that stores measurable business events — one row per event (an order, a click, a payment) — with numeric measures and foreign keys to dimension tables.
Grain
The precise definition of what a single row in a table represents. Declaring the grain — e.g. “one row per order line per day” — is the first and most important step in modeling a fact table.
Slowly changing dimension (SCD)
A pattern for handling dimension attributes that change over time. Type 1 overwrites the old value; Type 2 keeps history by adding a new row with valid-from/valid-to dates.
Star schema
A dimensional model with one central fact table joined to several dimension tables — drawn out, it looks like a star. The default warehouse design for fast, understandable analytics.
Surrogate key
A warehouse-generated unique identifier for a dimension row — independent of any source system ID. It keeps joins stable when natural keys are messy, reused, or change.
dbt model
A single SQL SELECT statement in a dbt project that defines one transformed table or view. dbt handles the DDL, dependencies, and materialization so you just write the SELECT.
Incremental model
A dbt materialization that, after the first full build, only processes and appends new or changed rows on each run — used to keep large fact tables affordable and fast to refresh.
Materialization
How dbt persists a model in the warehouse — as a view, a table, an incremental table, or ephemeral (inlined CTE). Chosen per model to balance freshness, cost, and query speed.
Staging model
The first dbt layer: one staging model per source table that does light cleanup — renaming columns, casting types, basic standardization — with no joins or business logic.
Common table expression (CTE)
A named, temporary result set defined with a WITH clause that you reference later in the same query — the primary tool for writing readable, layered, reviewable SQL.
Window function
A SQL function that computes across a set of rows related to the current row — running totals, rankings, period-over-period — without collapsing them into one row like GROUP BY does.
The Analytics Engineering Interview Kit
The SQL, dbt, and data-modeling questions that actually come up in screens — with worked answers. Free, straight to your inbox.
No spam. Unsubscribe anytime.
