The portfolio guide

Build a portfolio hiring managers actually read.

What separates a strong analytics-engineering portfolio from a collection of notebooks — what to include, what to leave out, how to present it on GitHub, and how to walk through it in an interview.

What a strong portfolio has

Six things hiring managers screen for.

These come straight from analytics-engineering hiring managers at mid-to-large data teams. The same six show up in every screen.

Realistic business context

Toy datasets (Iris, Titanic, Northwind) signal you've never seen real data. The strongest portfolio projects use scenarios a working analytics engineer might be handed — orders, inventory, events, metric layers. The brief should read like a Linear ticket.

End-to-end, not a notebook

A 200-cell notebook is not a portfolio piece. A portfolio piece is a Git repo with sources, staging models, intermediate models, marts, tests, docs, and a README. If a stakeholder can't read your README and understand what you built, the project isn't done.

Tests + documentation, not just transformations

Hiring managers screen for engineering rigor, not just SQL fluency. Every model should have a uniqueness test and a not-null test on its primary key. Every column in a mart should have a doc string. This is what separates analyst work from analytics engineering.

A clean Git history

Five logical commits — sources, staging, marts, tests, docs — read better than a hundred wip commits. Use a feature branch, open a real PR against main, and merge with the PR body explaining what the project does.

Output you can show

A dbt project is invisible without an output. Add a tiny published Looker or Streamlit dashboard, or a documentation site (dbt docs generate + GitHub Pages). The hiring manager sees the chart first, then the SQL.

Honest framing

Don't claim production scale you didn't ship. The strongest portfolio framing is 'I built this project to learn X, and here's what I learned.' Hiring managers respond to the learning narrative more than to inflated stakes.

The repo structure

What a portfolio dbt project looks like on GitHub.

A clean repository structure is the easiest signal to send. Match this layout and a reviewer can skim your project in 90 seconds.

README.md
The first thing anyone reads. Lead with a 1-paragraph scenario, then the architecture diagram (source → staging → marts), then 'how to run this locally', then 'what's tested', then 'what I learned'.
models/sources.yml
Declare every raw table. Freshness checks where relevant. Source tests for not-null on PKs.
models/staging/
One file per source table. Standardize names, types, units. Document every column.
models/intermediate/
Business logic that isn't yet a final mart — joining staging tables, building rolling aggregates, etc.
models/marts/
The shipped tables. Star schema in 9 cases out of 10. Tests for uniqueness, not-null, relationships, and accepted-values where applicable.
tests/
Singular tests for anything generic tests don't cover. Edge cases worth pointing out in the README.
macros/
Only if you have one; macros for the sake of macros are a red flag. A small reusable helper is a green flag.
dashboards/ or docs/
A screenshot or two of the output. A link to the dbt docs site if you've generated one.

Recommended starting points

Four projects that map cleanly to the brief above.

Each one is a self-contained business scenario with realistic source tables and a clear final mart layer. Together they cover SQL, data modeling, and a full dbt+BigQuery build.

Walking through it in interviews

Four questions hiring managers always ask.

Prepare a 90-second answer for each. Open your repo, share your screen, and click while you talk. Show, don't just describe.

Walk me through this project.
Open the README. Read the scenario in your own words (90 seconds). Show the architecture diagram. Click into one mart model and explain what it answers. Click into one test and explain why it exists. End with what you'd change if you started over.
Why did you choose this dataset?
The honest answer wins. 'I wanted to practice star-schema design on something with realistic joins.' 'I wanted to ship a dbt project end-to-end from sources to docs.' Don't oversell — the hiring manager has seen the same five Kaggle datasets a hundred times.
What would you do differently?
Always have an answer ready. Examples: 'I'd add incremental models for the largest fact table.' 'I'd separate test data from production data with a dbt seed.' 'I'd add a freshness check on the orders source.' Showing self-critique is the strongest signal you can give.
What was the hardest part?
Pick a real technical problem. Slowly-changing dimensions, late-arriving facts, a join that exploded the row count. Walk through how you diagnosed and fixed it. This is the question that separates good portfolios from great ones.

Ship a portfolio piece this month.

Pick a project, work through it end-to-end, and push it to GitHub. The capstone in the full course takes most students 20–40 hours and produces a portfolio piece you can talk through in any AE interview.

Browse all projects See the capstone in the course →

Build a portfolio hiring managers actually read.

Six things hiring managers screen for.

Realistic business context

End-to-end, not a notebook

Tests + documentation, not just transformations

A clean Git history

Output you can show

Honest framing

What a portfolio dbt project looks like on GitHub.

Four projects that map cleanly to the brief above.

Data Forge: The Lost Metrics

Sports Equipment Pro Shop

Champion Fantasy League

SQL Mystery Challenge: The Case of the Vanishing Artifacts

Four questions hiring managers always ask.

Walk me through this project.

Why did you choose this dataset?

What would you do differently?

What was the hardest part?

Ship a portfolio piece this month.