Schema Design: One Fact, One Place

Concept. Schema design decides which tables hold which columns. One rule does most of the work: store each fact once, in one place.

Intuition. Put everything in one wide table and a user's email rides along on every listen. The day Mickey changes his email, you have to fix it in every row, and you will miss one. Split the repeated facts into their own tables so each lives once. That split is normalization.

Example. The Spotify schema you have used since day one is already normalized. That is why it is three tables (Users, Songs, Listens), not one.

The problem: one big table repeats itself

Figure 1. Repetition is the root problem. Because Mickey's email and the artist of each song are copied onto every listen, three kinds of bug become possible. An update has to touch many rows and can leave them disagreeing. An insert cannot record a song until someone plays it. A delete can erase a fact that happened to live in only one row. These are the update, insert, and delete anomalies.

The fix: give each fact one home

Pull the repeated facts out into their own tables. A user's email lives once in Users. A song's artist lives once in Songs. Listens keeps only the events, and points at the other two by key.

Figure 2. This split is normalization. The strict, formal version has a name, Boyce-Codd Normal Form (BCNF), but the intuition is the whole point: no fact is stored twice, so nothing can drift out of sync. Change Mickey's email in one row of Users and every query sees the new one. A foreign key, like `Listens.user_id`, is just a pointer to the single home of a fact.

Splitting the table is exactly why you JOIN. A query recombines the facts you separated, so every join you wrote in Module 1 was putting a normalized schema back together to answer one question. Normalization and joins are two halves of one design: pull facts apart to store them safely, join them back to use them.

The trade-off: normalize first, denormalize only when forced

Figure 3. Normalization buys correctness and pays with joins. Denormalization buys read speed and pays with the risk that copies of the same fact disagree. Default to normalized; reach for denormalization only when a real, measured read cost forces it. You will meet denormalization again at scale in Module 5, where copying data on purpose is how a system serves millions of reads.

So the three-table Spotify schema was a design choice all along: one fact, one place, recombined by joins. Keep the rule in your head every time you sketch a table. If a column repeats the same value down the rows, it probably belongs in a table of its own.