30 dbt Interview Questions and Answers (From Real Screens)
Get 30 practitioner-grade dbt interview questions and answers grouped by fundamentals, modeling, tests/deployment, and scenarios—with real SQL/Jinja examples.
Here are 30 practitioner-grade dbt interview questions and answers you can use to prep quickly and go deep. We group them by fundamentals, modeling, tests/deployment, and scenario-based problem solving. Each answer is concise and includes realistic SQL/Jinja/YAML where it matters. If youre preparing for a dbt interview and want more breadth, see the Interviews topic hub and our broader Interview Prep: 50 Questions and Answers for Analytics Engineer Roles.
Fundamentals: 10 questions
1) What is Data Build Tool (DBT)?
dbt is the data build tool that lets you transform data in your data warehouse using SQL and Jinja. It manages dependencies between models, compiles templated SQL, runs statements, and tests results. Teams use it to turn raw data into well-modeled, documented tables and views.
2) Where does dbt fit in the data engineering workflow?
dbt sits in ELT after load. Upstream ingestion puts raw data into the data warehouse; dbt handles data transformation and modeling: staging, marts, tests, docs, and deployment. It complements your orchestrator and ingestion tools in a modern data pipeline.
3) How does dbt work under the hood?
dbt compiles your Jinja+SQL to pure SQL, builds a DAG via ref()/source(), and executes statements in topological order. The execution of dbt models can be materialized as views, tables, or an incremental model. Think: parse compile run test document.
# Example: compiled order
$ dbt build --select dim_orders
# dbt compiles Jinja to SQL then runs it in your warehouse
4) How do ref() and source() work?
ref('model_name') declares dependencies and points to built models; source('name','table') points to declared sources and protects against schema drift. This enables environment-agnostic SQL and a reliable DAG.
-- models/stg_orders.sql
with src as (
select *
from {{ source('sales', 'orders') }}
)
select * from src
# models/sources.yml
version: 2
sources:
- name: sales
schema: raw
tables:
- name: orders
The phrase source in dbt refers to that YAML declaration of upstream objects.
5) Compare materializations: view vs table vs incremental vs ephemeral.
| Materialization | Storage | Pros | Typical use |
|---|---|---|---|
| view | No table storage | Always fresh, simple | Light transforms, low cost |
| table | Physical table | Fast reads, stable | Heavy transforms, marts |
| incremental | Physical table, append/merge | Scales for large data | Big fact tables, CDC |
| ephemeral | No relation; inlined CTE | No objects clutter | Reusable logic chunks |
6) Whats a seed and when use it?
A seed is a CSV checked into your dbt project that dbt loads as a small table (lookup lists, country codes, SLA thresholds). Its version-controlled and deployable like models.
# seeds/sla_thresholds.csv
entity,priority,max_hours
orders,high,4
orders,medium,24
# dbt_project.yml
seeds:
+schema: reference
7) What is a snapshot?
Snapshots capture slowly changing records by tracking field changes over time (SCD2 behavior). They use check or timestamp strategies to version records.
-- snapshots/customers.sql
{% snapshot customers_snapshot %}
{{ config(target_schema='snapshots', unique_key='customer_id', strategy='check', check_cols=['email','status']) }}
select * from {{ source('crm','customers') }}
{% endsnapshot %}
8) How do Jinja and macros help?
Jinja lets you parameterize SQL; macros package reusable logic. Use them to DRY up joins and constraints. For deeper patterns, see dbt Macros & Jinja Tips Every Analytics Engineer Should Know: Expert Guide. Using dbt macros wisely keeps modeling consistent.
-- macros/surrogate_key.sql
{% macro surrogate_key(cols) %}
md5(concat_ws('||', {% for c in cols %}coalesce({{ c }}, ''), {% endfor %} ''))
{% endmacro %}
9) How do dbt tests work?
Generic tests (e.g., unique, not_null) and custom tests run SQL assertions. Put them in YAML near models and sources.
version: 2
models:
- name: dim_customers
columns:
- name: customer_id
tests: [not_null, unique]
10) Whats the difference between dbt run and dbt build?
dbt run executes models only. dbt build executes models, tests, and snapshots for a selected graph. Prefer dbt build in CI to validate end-to-end. You can also run dbt with selectors and state, or literally run dbt via job orchestration.
Modeling: 8 questions
11) How would you layer a dbt project?
Common pattern: stg_* (1:1 modeling to raw data, type casting, light conforming), int_* (intermediate joins/rollups), dim_* and fct_* (business-ready marts). Keep names consistent, one primary key per model, and document each dbt model.
-- models/staging/sales/stg_orders.sql
{{ config(materialized='view') }}
select
cast(id as bigint) as order_id,
cast(created_at as timestamp) as created_at,
{{ surrogate_key(['id']) }} as order_sk
from {{ source('sales','orders') }}
12) How do you ensure idempotency and handle late-arriving data?
Use deterministic keys, avoid now() in logic, and prefer merge-based incremental models with a strict unique_key. For late-arriving facts, reprocess a rolling window in your incremental model and include a WHERE clause on a reliable watermark.
13) Show an incremental model pattern youd use for 40M+ rows.
-- models/marts/fct_orders.sql
{{ config(materialized='incremental', unique_key='order_id', on_schema_change='sync_all_columns') }}
with src as (
select *
from {{ ref('stg_orders') }}
{% if is_incremental() %}
where created_at >= dateadd(day, -3, (select max(created_at) from {{ this }}))
{% endif %}
)
select
order_id,
created_at,
total_amount
from src
Pick a window that balances correctness and cost. For BigQuery, partition and cluster to speed merges; for Snowflake, ensure filters align with micro-partition pruning.
14) How do you manage schemas and aliases?
Use environments via target.name and per-folder +schema configs. Alias long model names when necessary for BI compatibility.
# dbt_project.yml
models:
marts:
+schema: analytics
+materialized: table
sales:
dim_customers:
+alias: customers
15) How do you optimize for Snowflake vs BigQuery specifics?
- Snowflake: Respect clustering by frequently-filtered columns if tables are huge; leverage
mergepatterns; watch for micro-partition pruning. Avoid unnecessaryorder byin CREATE. - BigQuery: Use
partition byandcluster byin configs; prefermergeover append+dedupe for correctness; mind slot usage and bytes processed.
-- BigQuery config example
{{ config(
materialized='incremental',
unique_key='order_id',
partition_by={'field': 'created_at', 'data_type': 'timestamp', 'granularity': 'day'},
cluster_by=['customer_id']
) }}
16) When do you use ephemeral models?
When logic is reused but doesnt need a physical object. Example: a dimensioned calendar CTE or a dedupe step you call from multiple models. It keeps warehouses tidy and improves performance by inlining SQL.
-- models/common/dedupe_orders.sql
{{ config(materialized='ephemeral') }}
select as value * except(rn) from (
select *, row_number() over (partition by order_id order by updated_at desc) rn
from {{ ref('stg_orders') }}
) where rn=1
17) How do you document and expose models for BI?
Use description, meta, and exposures to show lineage from BI dashboards back to marts. Document columns, definitions, and caveats. Add doc blocks for business logic. This is essential data quality hygiene.
version: 2
models:
- name: fct_orders
description: Daily grain orders fact with financial fields.
columns:
- name: total_amount
description: Final charge amount in USD.
exposures:
- name: revenue_dashboard
type: dashboard
depends_on: [ref('fct_orders')]
owner: {name: Finance, email: finops@example.com}
18) How do you share reusable logic across projects?
Package macros and models into a package (e.g., packages.yml) and version it. Keep warehouse-agnostic logic in macros; use adapter dispatch for database-specific SQL. This is how you integrate dbt into a larger platform architecture.
# packages.yml
packages:
- package: dbt-labs/codegen
version: 0.12.1
Tests, Deployment, and Ops: 6 questions
19) How do you create custom tests?
Define a generic test macro returning failing rows; call it in YAML with arguments.
-- tests/not_negative.sql
{% test not_negative(model, column_name) %}
select * from {{ model }} where {{ column_name }} < 0
{% endtest %}
version: 2
models:
- name: fct_orders
columns:
- name: total_amount
tests:
- not_negative
20) How do you monitor sources and freshness?
Declare freshness on sources and schedule dbt source freshness (or dbt build) to detect delays. Alert on WARN/ERROR thresholds to protect SLAs and data pipeline expectations.
version: 2
sources:
- name: sales
schema: raw
tables:
- name: orders
freshness:
warn_after: {count: 2, period: hour}
error_after: {count: 6, period: hour}
21) How do you handle secrets and sensitive data?
Do not hardcode credentials or PII. Use warehouse RBAC, column masking, and env_var() for keys. Restrict schemas and leverage views to expose only necessary fields. For dbt Cloud, use environment variables and job-level credentials. For dbt Core, use profiles.yml with secure secrets storage.
-- models/pii_masked_customers.sql
select
customer_id,
case when {{ env_var('PII_ENABLED','false') }} = 'true' then email else null end as email
from {{ ref('stg_customers') }}
This answer touches security measures and how to handle sensitive data in dbt models without over-explaining secrets management vendors.
22) How do you deploy: dbt Cloud vs Core?
dbt Cloud provides hosted IDE, jobs, logging, and RBAC. dbt Core is the open-source CLI you run in your own CI. Choose based on team size and platform constraints. For a full comparison, skim dbt Cloud vs Core: Feature Comparison 2025Comprehensive Guide. Mentioning dbt core and dbt together clarifies choices: you can adopt dbt core and dbt Cloud interchangeably per environment.
23) How would you schedule and orchestrate?
In dbt Cloud, configure jobs and triggers. In external schedulers, call the CLI in tasks. Keep deployments atomic and use selectors. For broader tradeoffs with orchestrators, see dbt Cloud vs. Airflow: Comparing Popular Data Engineering Tools in 2025.
# GitHub Actions (Core) example
name: dbt-ci
on: [pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: {python-version: '3.11'}
- run: pip install dbt-bigquery
- run: dbt deps
- run: dbt build --select state:modified+ --state target
24) How do you control scope with selectors and state?
Use --select with tags, directories, or state:modified+ to limit runs. Store a previous manifest and let dbt compare changes. This speeds feedback loops in CI and daily dbt run jobs.
25) What do you log, alert, and document in ops?
Log run artifacts (manifest.json, run_results.json), track test failures, and wire alerts to incident channels. Surface lineage and exposures to stakeholders. Tie failures to owners. This keeps data quality visible and accountable.
Scenario-based: 6 questions
26) Your orders table has 40M rows and nightly loads doubled runtime. What do you do?
Checklist:
- Ensure incremental model with
mergeand a stableunique_key. - Switch to partitioned+clustered table (BigQuery) or align filters with micro-partitions (Snowflake).
- Introduce a rolling watermark window (e.g., last 3 days) and backfill separately.
- Inline heavy subqueries via ephemeral models to reduce intermediate I/O.
- Prune columns and avoid cross joins; move UDFs to precomputed fields if possible.
Validate via explain plans and compare bytes scanned (BigQuery) or warehouse credits (Snowflake) after changes.
27) You have a circular dependency. How do you untangle it?
Find the cycle with dbt ls --graph or DAG view. Extract shared logic into an ephemeral model or macro, or split a model into two steps so ref() only points forward. If BI exposure queries back a mart, keep that out of dbt models to avoid feedback loops.
28) A breaking schema change hit a source at 9am. How do you contain blast radius?
Rely on source freshness and schema tests to fail fast. Use environment isolation: freeze prod while fixing staging. Add a compatibility shim in stg_* (e.g., cast renamed columns). Update sources YAML, add a test for the new contract, then resume. Communicate via exposures owners.
29) Finance needs a full backfill for last year but daily jobs must stay fast. Plan?
Create a backfill branch with a parameterized model. Run a one-time historical job that writes to a temp table, then insert overwrite/merge into prod. Keep your daily incremental model window small. Tag backfill models and exclude them from regular schedules.
30) Your team must redact emails for analysts but allow admins. How?
Implement warehouse-level masking policies (preferred) and use role-based views. In dbt, route analysts to a masked schema while admins hit the full mart. Use env_var() toggles sparingly for non-prod testing but dont rely on them for prod security boundaries.
Fast answers to common dbt interview questions (FAQ)
What is DBT and how does it fit into the data engineering workflow?
Its the data build tool for transforming warehouse tables with SQL. Ingest raw data first; then use dbt to model, test, and documentits ELTs T. It slots between ingestion and BI/reporting in a modern data stack.
How does DBT interact with Snowflake?
dbt uses the Snowflake connector to execute compiled SQL, manage schemas, and perform merge-based incremental models. It respects Snowflake roles, warehouses, micro-partition pruning, and can set session parameters via configs.
Can you explain how DBT works with BigQuery?
dbt uses the BigQuery adapter to run compiled SQL as jobs. Configure partition_by and cluster_by in model configs, and dbt will create native partitioned/clustered tables. Incremental models use merge. Billing is by bytes processed, so test queries matter.
What are the advantages of using DBT with a data lake?
If your lake is queryable via engines dbt supports (e.g., external tables or lakehouse patterns), you can keep governance in one place and centralize data transformation. Advantages: single codebase for dbt transformations, consistent testing, and lineage across lake and warehouse layers.
How does DBT handle incremental loads of data from source systems?
With the incremental materialization. dbt compiles conditional SQL with is_incremental() and performs merge or insert patterns keyed by unique_key. You supply a reliable watermark column and optional windowing.
What are the security measures taken by DBT to support data access and transformation?
dbt defers to your warehouse RBAC and network controls. In dbt Cloud, use SSO, SCIM, and job-level credentials. In Core, use profiles with least-privilege roles. Add masking, views, and schema isolation for PII, and avoid embedding secrets in code.
How do you handle sensitive data in DBT models?
Isolate PII in separate schemas, apply masking or views, and ensure tests dont emit PII into artifacts. Parameterize with env_var() for non-prod redaction tests and verify permissions at the warehouse level.
List out the primary use cases of DBT.
- Transform raw data into staging and marts
- Build incremental fact tables and dimensions
- Enforce tests for data quality
- Document lineage and exposures
- Standardize logic via macros and packages
Do you think DBT is a programming language?
No. dbt is a framework that orchestrates SQL and Jinja with a CLI/Cloud service. You still write SQL to transform data.
Reference snippets youll likely show on-screen
Source + staging pattern
version: 2
sources:
- name: payments
schema: raw
tables:
- name: transactions
columns:
- name: amount
tests: [not_null]
-- models/staging/payments/stg_transactions.sql
{{ config(materialized='view') }}
select
cast(id as string) as transaction_id,
cast(amount as numeric) as amount,
parse_timestamp('%Y-%m-%dT%H:%M:%S', created_at) as created_at
from {{ source('payments','transactions') }}
Unit-like test via custom generic
-- tests/only_known_statuses.sql
{% test only_known_statuses(model, column_name, allowed) %}
select * from {{ model }} where {{ column_name }} not in ({{ allowed | join(",") }})
{% endtest %}
version: 2
models:
- name: stg_orders
columns:
- name: status
tests:
- only_known_statuses:
allowed: ['\'open\'','\'closed\'','\'cancelled\'']
Practical comparisons and gotchas
| Topic | Better default | Why | When to switch |
|---|---|---|---|
| Materialization | table for marts | Faster BI, stable | view for tiny dims |
| Incremental strategy | merge | Upserts dedupe | append+dedupe for append-only logs |
| Freshness checks | WARN/ERROR thresholds | Alert on delays | Disable on volatile dev |
| Environments | schema per target | Isolation | database per target for strict isolation |
Expert tips for dbt interviews
- Show the DAG and explain how dbt compiles and executes it. Use the phrase clearly: dbt compiles Jinja to SQL, then executes in dependency order.
- Give one realistic incremental windowing example and one backfill plan.
- Describe how you enforce contracts with tests and
on_schema_change. - Know how to select scopes (
state:modified+, tags) and how to run dbt in CI. - Be ready to discuss warehouse-specific tuning for BigQuery and Snowflake.
Deep-dive mini-answers to show expertise
How do you keep models small and composable?
Favor narrow dbt transformations, each with a single responsibility: staging casts and standardization; intermediate joins; marts for business logic. Keep each dbt model at one business grain.
What about data types and casting?
Cast early in staging to canonical data types; dont let implicit casts hide issues. Add accepted_values tests to enforce enumerations and not_null on primary keys to strengthen contracts.
How do you gate changes safely?
Use branches, CI dbt build, state-based selection, and a shadow schema for dry runs. Promote to prod schemas via PR. In Cloud, separate dev/prod jobs with different targets.
How do you document for stakeholders?
Describe metrics, grain, and caveats in YAML. Add exposures for dashboards with owners. Link run artifacts to tickets. This builds trust and surfaces data quality ownership.
Search-intent wrap-up
This set of dbt interview questions compresses what hiring teams probe: DAG fluency, incremental design, warehouse tuning, testing, and deployment. Use it as rapid-fire interview questions and answers and bring code snippets ready to paste in a shared screen. If you need structured practice beyond dbt, revisit the Analytics Engineering Interview Mistakes guide for patterns to avoid.
Appendix: Short definitions you can quote
- dbt build tool: the framework used to model, test, and document SQL transformations in your warehouse.
- Incremental model: a materialization that updates only changed/new rows based on a key and watermark.
- Packages: reusable bundles of models/macros versioned via
packages.yml. - Tests: SQL assertions run by
dbt testordbt buildto enforce contracts.
Platform notes, integrations, and scope
You can integrate dbt with your orchestrator by calling the CLI in tasks, or run fully in dbt Cloud. Keep job scope tight and prefer idempotent steps. If streaming is on your roadmap, read the snapshot of patterns in Building Streaming Data Models with dbt & Kafka: A Modern Guide for design choices; we wont retread those details here.
Strategy: how to tell your story
Interviewers want signal that you can master dbt at production scale. Walk through a small dbt project youve built, explain your model layers, an incremental model design, your tests, and how you deploy. Show the lineage graph, call out one production fire you solved, and the long-term fix.
Final pointers
- Keep answers grounded in the data warehouse youve actually run (Snowflake or BigQuery specifics).
- If asked What are your favorite dbt questions?, reply with topics that show you think in systems: lineage, contracts, workloads, and recovery.
- For a full stack comparison mindset, you can also mention tradeoffs from dbt Cloud vs Core without going deep.
Glossary hits for SEO (you can weave these into your answers)
Use these phrasing anchors once if helpful when answering live: dbt core and dbt, dbt transformations, data transformation and modeling, preparing for a dbt interview, interview questions and answers, dbt interview questions and answers, integrate dbt, master dbt.
Finally, if you want to use dbt hands-on and get graded feedback, try our free practice prompts at /practice.
Take this concept into practice.
Reading is fine. Doing is what gets you hired. Pick an exercise on this topic or open a portfolio project.
Part of these learning paths.
- Interviews
Analytics Engineering Interview Mistakes and How to Avoid Them: A Complete Guide
Discover common analytics engineering interview mistakes and learn strategies to avoid them. Enhance your preparation and communication skills for success.
- Interviews
Interview Prep: 50 Questions and Answers for Analytics Engineer Roles
Prepare for analytics engineer interviews with 50 essential questions. This guide covers technical skills, data modeling, and problem-solving scenarios.
- Interviews
Common SQL Mistakes Beginners Make During Interviews: Essential Pitfalls to Avoid
Discover common SQL interview mistakes like JOIN errors and WHERE clause neglect. Learn how to improve your problem-solving approach for better outcomes.
Drill it in the exercise library.
Portfolio-ready builds on this topic.
- intermediate · open →
SQL Alien Invasion Challenge: Defend Earth
Crisis-response analytics: defend Earth with multi-table joins, aggregation, and CTEs.
- advanced · open →
SQL Mystery Challenge: The Case of the Vanishing Artifacts
Investigative SQL: follow the evidence across museum audit logs to unmask a thief.
