30 dbt Interview Questions and Answers (From Real Screens)

Get 30 practitioner-grade dbt interview questions and answers grouped by fundamentals, modeling, tests/deployment, and scenarios—with real SQL/Jinja examples.

Here are 30 practitioner-grade dbt interview questions and answers you can use to prep quickly and go deep. We group them by fundamentals, modeling, tests/deployment, and scenario-based problem solving. Each answer is concise and includes realistic SQL/Jinja/YAML where it matters. If youre preparing for a dbt interview and want more breadth, see the Interviews topic hub and our broader Interview Prep: 50 Questions and Answers for Analytics Engineer Roles.

Fundamentals: 10 questions

1) What is Data Build Tool (DBT)?

dbt is the data build tool that lets you transform data in your data warehouse using SQL and Jinja. It manages dependencies between models, compiles templated SQL, runs statements, and tests results. Teams use it to turn raw data into well-modeled, documented tables and views.

2) Where does dbt fit in the data engineering workflow?

dbt sits in ELT after load. Upstream ingestion puts raw data into the data warehouse; dbt handles data transformation and modeling: staging, marts, tests, docs, and deployment. It complements your orchestrator and ingestion tools in a modern data pipeline.

3) How does dbt work under the hood?

dbt compiles your Jinja+SQL to pure SQL, builds a DAG via ref()/source(), and executes statements in topological order. The execution of dbt models can be materialized as views, tables, or an incremental model. Think: parse compile run test document.

# Example: compiled order
$ dbt build --select dim_orders
# dbt compiles Jinja to SQL then runs it in your warehouse

4) How do ref() and source() work?

ref('model_name') declares dependencies and points to built models; source('name','table') points to declared sources and protects against schema drift. This enables environment-agnostic SQL and a reliable DAG.

-- models/stg_orders.sql
with src as (
  select *
  from {{ source('sales', 'orders') }}
)
select * from src
# models/sources.yml
version: 2
sources:
  - name: sales
    schema: raw
    tables:
      - name: orders

The phrase source in dbt refers to that YAML declaration of upstream objects.

5) Compare materializations: view vs table vs incremental vs ephemeral.

MaterializationStorageProsTypical use
viewNo table storageAlways fresh, simpleLight transforms, low cost
tablePhysical tableFast reads, stableHeavy transforms, marts
incrementalPhysical table, append/mergeScales for large dataBig fact tables, CDC
ephemeralNo relation; inlined CTENo objects clutterReusable logic chunks

6) Whats a seed and when use it?

A seed is a CSV checked into your dbt project that dbt loads as a small table (lookup lists, country codes, SLA thresholds). Its version-controlled and deployable like models.

# seeds/sla_thresholds.csv
entity,priority,max_hours
orders,high,4
orders,medium,24
# dbt_project.yml
seeds:
  +schema: reference

7) What is a snapshot?

Snapshots capture slowly changing records by tracking field changes over time (SCD2 behavior). They use check or timestamp strategies to version records.

-- snapshots/customers.sql
{% snapshot customers_snapshot %}
  {{ config(target_schema='snapshots', unique_key='customer_id', strategy='check', check_cols=['email','status']) }}
  select * from {{ source('crm','customers') }}
{% endsnapshot %}

8) How do Jinja and macros help?

Jinja lets you parameterize SQL; macros package reusable logic. Use them to DRY up joins and constraints. For deeper patterns, see dbt Macros & Jinja Tips Every Analytics Engineer Should Know: Expert Guide. Using dbt macros wisely keeps modeling consistent.

-- macros/surrogate_key.sql
{% macro surrogate_key(cols) %}
  md5(concat_ws('||', {% for c in cols %}coalesce({{ c }}, ''), {% endfor %} ''))
{% endmacro %}

9) How do dbt tests work?

Generic tests (e.g., unique, not_null) and custom tests run SQL assertions. Put them in YAML near models and sources.

version: 2
models:
  - name: dim_customers
    columns:
      - name: customer_id
        tests: [not_null, unique]

10) Whats the difference between dbt run and dbt build?

dbt run executes models only. dbt build executes models, tests, and snapshots for a selected graph. Prefer dbt build in CI to validate end-to-end. You can also run dbt with selectors and state, or literally run dbt via job orchestration.

Modeling: 8 questions

11) How would you layer a dbt project?

Common pattern: stg_* (1:1 modeling to raw data, type casting, light conforming), int_* (intermediate joins/rollups), dim_* and fct_* (business-ready marts). Keep names consistent, one primary key per model, and document each dbt model.

-- models/staging/sales/stg_orders.sql
{{ config(materialized='view') }}
select
  cast(id as bigint) as order_id,
  cast(created_at as timestamp) as created_at,
  {{ surrogate_key(['id']) }} as order_sk
from {{ source('sales','orders') }}

12) How do you ensure idempotency and handle late-arriving data?

Use deterministic keys, avoid now() in logic, and prefer merge-based incremental models with a strict unique_key. For late-arriving facts, reprocess a rolling window in your incremental model and include a WHERE clause on a reliable watermark.

13) Show an incremental model pattern youd use for 40M+ rows.

-- models/marts/fct_orders.sql
{{ config(materialized='incremental', unique_key='order_id', on_schema_change='sync_all_columns') }}
with src as (
  select *
  from {{ ref('stg_orders') }}
  {% if is_incremental() %}
    where created_at >= dateadd(day, -3, (select max(created_at) from {{ this }}))
  {% endif %}
)
select
  order_id,
  created_at,
  total_amount
from src

Pick a window that balances correctness and cost. For BigQuery, partition and cluster to speed merges; for Snowflake, ensure filters align with micro-partition pruning.

14) How do you manage schemas and aliases?

Use environments via target.name and per-folder +schema configs. Alias long model names when necessary for BI compatibility.

# dbt_project.yml
models:
  marts:
    +schema: analytics
    +materialized: table
    sales:
      dim_customers:
        +alias: customers

15) How do you optimize for Snowflake vs BigQuery specifics?

  • Snowflake: Respect clustering by frequently-filtered columns if tables are huge; leverage merge patterns; watch for micro-partition pruning. Avoid unnecessary order by in CREATE.
  • BigQuery: Use partition by and cluster by in configs; prefer merge over append+dedupe for correctness; mind slot usage and bytes processed.
-- BigQuery config example
{{ config(
  materialized='incremental',
  unique_key='order_id',
  partition_by={'field': 'created_at', 'data_type': 'timestamp', 'granularity': 'day'},
  cluster_by=['customer_id']
) }}

16) When do you use ephemeral models?

When logic is reused but doesnt need a physical object. Example: a dimensioned calendar CTE or a dedupe step you call from multiple models. It keeps warehouses tidy and improves performance by inlining SQL.

-- models/common/dedupe_orders.sql
{{ config(materialized='ephemeral') }}
select as value * except(rn) from (
  select *, row_number() over (partition by order_id order by updated_at desc) rn
  from {{ ref('stg_orders') }}
) where rn=1

17) How do you document and expose models for BI?

Use description, meta, and exposures to show lineage from BI dashboards back to marts. Document columns, definitions, and caveats. Add doc blocks for business logic. This is essential data quality hygiene.

version: 2
models:
  - name: fct_orders
    description: Daily grain orders fact with financial fields.
    columns:
      - name: total_amount
        description: Final charge amount in USD.
exposures:
  - name: revenue_dashboard
    type: dashboard
    depends_on: [ref('fct_orders')]
    owner: {name: Finance, email: finops@example.com}

18) How do you share reusable logic across projects?

Package macros and models into a package (e.g., packages.yml) and version it. Keep warehouse-agnostic logic in macros; use adapter dispatch for database-specific SQL. This is how you integrate dbt into a larger platform architecture.

# packages.yml
packages:
  - package: dbt-labs/codegen
    version: 0.12.1

Tests, Deployment, and Ops: 6 questions

19) How do you create custom tests?

Define a generic test macro returning failing rows; call it in YAML with arguments.

-- tests/not_negative.sql
{% test not_negative(model, column_name) %}
select * from {{ model }} where {{ column_name }} < 0
{% endtest %}
version: 2
models:
  - name: fct_orders
    columns:
      - name: total_amount
        tests:
          - not_negative

20) How do you monitor sources and freshness?

Declare freshness on sources and schedule dbt source freshness (or dbt build) to detect delays. Alert on WARN/ERROR thresholds to protect SLAs and data pipeline expectations.

version: 2
sources:
  - name: sales
    schema: raw
    tables:
      - name: orders
        freshness:
          warn_after: {count: 2, period: hour}
          error_after: {count: 6, period: hour}

21) How do you handle secrets and sensitive data?

Do not hardcode credentials or PII. Use warehouse RBAC, column masking, and env_var() for keys. Restrict schemas and leverage views to expose only necessary fields. For dbt Cloud, use environment variables and job-level credentials. For dbt Core, use profiles.yml with secure secrets storage.

-- models/pii_masked_customers.sql
select
  customer_id,
  case when {{ env_var('PII_ENABLED','false') }} = 'true' then email else null end as email
from {{ ref('stg_customers') }}

This answer touches security measures and how to handle sensitive data in dbt models without over-explaining secrets management vendors.

22) How do you deploy: dbt Cloud vs Core?

dbt Cloud provides hosted IDE, jobs, logging, and RBAC. dbt Core is the open-source CLI you run in your own CI. Choose based on team size and platform constraints. For a full comparison, skim dbt Cloud vs Core: Feature Comparison 2025Comprehensive Guide. Mentioning dbt core and dbt together clarifies choices: you can adopt dbt core and dbt Cloud interchangeably per environment.

23) How would you schedule and orchestrate?

In dbt Cloud, configure jobs and triggers. In external schedulers, call the CLI in tasks. Keep deployments atomic and use selectors. For broader tradeoffs with orchestrators, see dbt Cloud vs. Airflow: Comparing Popular Data Engineering Tools in 2025.

# GitHub Actions (Core) example
name: dbt-ci
on: [pull_request]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: '3.11'}
      - run: pip install dbt-bigquery
      - run: dbt deps
      - run: dbt build --select state:modified+ --state target

24) How do you control scope with selectors and state?

Use --select with tags, directories, or state:modified+ to limit runs. Store a previous manifest and let dbt compare changes. This speeds feedback loops in CI and daily dbt run jobs.

25) What do you log, alert, and document in ops?

Log run artifacts (manifest.json, run_results.json), track test failures, and wire alerts to incident channels. Surface lineage and exposures to stakeholders. Tie failures to owners. This keeps data quality visible and accountable.

Scenario-based: 6 questions

26) Your orders table has 40M rows and nightly loads doubled runtime. What do you do?

Checklist:

  • Ensure incremental model with merge and a stable unique_key.
  • Switch to partitioned+clustered table (BigQuery) or align filters with micro-partitions (Snowflake).
  • Introduce a rolling watermark window (e.g., last 3 days) and backfill separately.
  • Inline heavy subqueries via ephemeral models to reduce intermediate I/O.
  • Prune columns and avoid cross joins; move UDFs to precomputed fields if possible.

Validate via explain plans and compare bytes scanned (BigQuery) or warehouse credits (Snowflake) after changes.

27) You have a circular dependency. How do you untangle it?

Find the cycle with dbt ls --graph or DAG view. Extract shared logic into an ephemeral model or macro, or split a model into two steps so ref() only points forward. If BI exposure queries back a mart, keep that out of dbt models to avoid feedback loops.

28) A breaking schema change hit a source at 9am. How do you contain blast radius?

Rely on source freshness and schema tests to fail fast. Use environment isolation: freeze prod while fixing staging. Add a compatibility shim in stg_* (e.g., cast renamed columns). Update sources YAML, add a test for the new contract, then resume. Communicate via exposures owners.

29) Finance needs a full backfill for last year but daily jobs must stay fast. Plan?

Create a backfill branch with a parameterized model. Run a one-time historical job that writes to a temp table, then insert overwrite/merge into prod. Keep your daily incremental model window small. Tag backfill models and exclude them from regular schedules.

30) Your team must redact emails for analysts but allow admins. How?

Implement warehouse-level masking policies (preferred) and use role-based views. In dbt, route analysts to a masked schema while admins hit the full mart. Use env_var() toggles sparingly for non-prod testing but dont rely on them for prod security boundaries.

Fast answers to common dbt interview questions (FAQ)

What is DBT and how does it fit into the data engineering workflow?

Its the data build tool for transforming warehouse tables with SQL. Ingest raw data first; then use dbt to model, test, and documentits ELTs T. It slots between ingestion and BI/reporting in a modern data stack.

How does DBT interact with Snowflake?

dbt uses the Snowflake connector to execute compiled SQL, manage schemas, and perform merge-based incremental models. It respects Snowflake roles, warehouses, micro-partition pruning, and can set session parameters via configs.

Can you explain how DBT works with BigQuery?

dbt uses the BigQuery adapter to run compiled SQL as jobs. Configure partition_by and cluster_by in model configs, and dbt will create native partitioned/clustered tables. Incremental models use merge. Billing is by bytes processed, so test queries matter.

What are the advantages of using DBT with a data lake?

If your lake is queryable via engines dbt supports (e.g., external tables or lakehouse patterns), you can keep governance in one place and centralize data transformation. Advantages: single codebase for dbt transformations, consistent testing, and lineage across lake and warehouse layers.

How does DBT handle incremental loads of data from source systems?

With the incremental materialization. dbt compiles conditional SQL with is_incremental() and performs merge or insert patterns keyed by unique_key. You supply a reliable watermark column and optional windowing.

What are the security measures taken by DBT to support data access and transformation?

dbt defers to your warehouse RBAC and network controls. In dbt Cloud, use SSO, SCIM, and job-level credentials. In Core, use profiles with least-privilege roles. Add masking, views, and schema isolation for PII, and avoid embedding secrets in code.

How do you handle sensitive data in DBT models?

Isolate PII in separate schemas, apply masking or views, and ensure tests dont emit PII into artifacts. Parameterize with env_var() for non-prod redaction tests and verify permissions at the warehouse level.

List out the primary use cases of DBT.

  • Transform raw data into staging and marts
  • Build incremental fact tables and dimensions
  • Enforce tests for data quality
  • Document lineage and exposures
  • Standardize logic via macros and packages

Do you think DBT is a programming language?

No. dbt is a framework that orchestrates SQL and Jinja with a CLI/Cloud service. You still write SQL to transform data.

Reference snippets youll likely show on-screen

Source + staging pattern

version: 2
sources:
  - name: payments
    schema: raw
    tables:
      - name: transactions
        columns:
          - name: amount
            tests: [not_null]
-- models/staging/payments/stg_transactions.sql
{{ config(materialized='view') }}
select
  cast(id as string) as transaction_id,
  cast(amount as numeric) as amount,
  parse_timestamp('%Y-%m-%dT%H:%M:%S', created_at) as created_at
from {{ source('payments','transactions') }}

Unit-like test via custom generic

-- tests/only_known_statuses.sql
{% test only_known_statuses(model, column_name, allowed) %}
select * from {{ model }} where {{ column_name }} not in ({{ allowed | join(",") }})
{% endtest %}
version: 2
models:
  - name: stg_orders
    columns:
      - name: status
        tests:
          - only_known_statuses:
              allowed: ['\'open\'','\'closed\'','\'cancelled\'']

Practical comparisons and gotchas

TopicBetter defaultWhyWhen to switch
Materializationtable for martsFaster BI, stableview for tiny dims
Incremental strategymergeUpserts dedupeappend+dedupe for append-only logs
Freshness checksWARN/ERROR thresholdsAlert on delaysDisable on volatile dev
Environmentsschema per targetIsolationdatabase per target for strict isolation

Expert tips for dbt interviews

  • Show the DAG and explain how dbt compiles and executes it. Use the phrase clearly: dbt compiles Jinja to SQL, then executes in dependency order.
  • Give one realistic incremental windowing example and one backfill plan.
  • Describe how you enforce contracts with tests and on_schema_change.
  • Know how to select scopes (state:modified+, tags) and how to run dbt in CI.
  • Be ready to discuss warehouse-specific tuning for BigQuery and Snowflake.

Deep-dive mini-answers to show expertise

How do you keep models small and composable?

Favor narrow dbt transformations, each with a single responsibility: staging casts and standardization; intermediate joins; marts for business logic. Keep each dbt model at one business grain.

What about data types and casting?

Cast early in staging to canonical data types; dont let implicit casts hide issues. Add accepted_values tests to enforce enumerations and not_null on primary keys to strengthen contracts.

How do you gate changes safely?

Use branches, CI dbt build, state-based selection, and a shadow schema for dry runs. Promote to prod schemas via PR. In Cloud, separate dev/prod jobs with different targets.

How do you document for stakeholders?

Describe metrics, grain, and caveats in YAML. Add exposures for dashboards with owners. Link run artifacts to tickets. This builds trust and surfaces data quality ownership.

Search-intent wrap-up

This set of dbt interview questions compresses what hiring teams probe: DAG fluency, incremental design, warehouse tuning, testing, and deployment. Use it as rapid-fire interview questions and answers and bring code snippets ready to paste in a shared screen. If you need structured practice beyond dbt, revisit the Analytics Engineering Interview Mistakes guide for patterns to avoid.

Appendix: Short definitions you can quote

  • dbt build tool: the framework used to model, test, and document SQL transformations in your warehouse.
  • Incremental model: a materialization that updates only changed/new rows based on a key and watermark.
  • Packages: reusable bundles of models/macros versioned via packages.yml.
  • Tests: SQL assertions run by dbt test or dbt build to enforce contracts.

Platform notes, integrations, and scope

You can integrate dbt with your orchestrator by calling the CLI in tasks, or run fully in dbt Cloud. Keep job scope tight and prefer idempotent steps. If streaming is on your roadmap, read the snapshot of patterns in Building Streaming Data Models with dbt & Kafka: A Modern Guide for design choices; we wont retread those details here.

Strategy: how to tell your story

Interviewers want signal that you can master dbt at production scale. Walk through a small dbt project youve built, explain your model layers, an incremental model design, your tests, and how you deploy. Show the lineage graph, call out one production fire you solved, and the long-term fix.

Final pointers

  • Keep answers grounded in the data warehouse youve actually run (Snowflake or BigQuery specifics).
  • If asked What are your favorite dbt questions?, reply with topics that show you think in systems: lineage, contracts, workloads, and recovery.
  • For a full stack comparison mindset, you can also mention tradeoffs from dbt Cloud vs Core without going deep.

Glossary hits for SEO (you can weave these into your answers)

Use these phrasing anchors once if helpful when answering live: dbt core and dbt, dbt transformations, data transformation and modeling, preparing for a dbt interview, interview questions and answers, dbt interview questions and answers, integrate dbt, master dbt.

Finally, if you want to use dbt hands-on and get graded feedback, try our free practice prompts at /practice.

Next steps

Take this concept into practice.

Reading is fine. Doing is what gets you hired. Pick an exercise on this topic or open a portfolio project.