GlossaryData modeling

Grain

Also: granularity

The precise definition of what a single row in a table represents. Declaring the grain — e.g. “one row per order line per day” — is the first and most important step in modeling a fact table.

Grain answers one question: what does one row mean? “One row per order,” “one row per order line,” and “one row per customer per day” are three different grains that produce three different tables and three different sets of valid metrics.

Declaring the grain up front prevents the most common modeling bug: mixing grains in one table, which silently double-counts measures the moment someone joins or aggregates. If you can't state the grain in a single sentence, you don't understand the table yet.

A good habit: write the grain as a comment at the top of every fact model, and add a uniqueness test on the columns that define it. If the test fails, your grain assumption is wrong.

Why it matters

Grain is the contract a table makes with everyone who queries it. A clear, tested grain is the difference between metrics you can trust and a table where every SUM is a coin flip.

It's also the most common senior-interview probe in data modeling, because it instantly reveals whether someone has actually shipped reliable models.

Common mistakes
  • Failing to declare the grain before writing SQL, then discovering mixed grains after metrics are already wrong in a dashboard.
  • Assuming a 'unique-looking' key is actually unique without a test to prove it.
FAQ
How do I choose the right grain?
Pick the lowest (most atomic) grain the business needs — usually one row per real-world event. You can always aggregate up; you can't recover detail you didn't store.

Learn this by building, not memorizing.

Definitions get you the vocabulary. The platform gets you the skill — graded exercises, real projects, and a portfolio capstone.