Skip to main content

Database Normalization (1NF, 2NF, 3NF)

What You'll Learn

By the end of this page, you will be able to:

  • Explain why normalization matters — what problems it solves
  • Define 1NF, 2NF, and 3NF with their exact requirements
  • Transform an unnormalized table step-by-step through each normal form
  • Identify functional dependencies and transitive dependencies
  • Know when to denormalize for performance

The Problem: Redundant & Inconsistent Data

Imagine a single table storing everything about orders:

order_idcustomer_namecustomer_emailproduct_namesquantitiesprices
1Alicealice@mail.comLaptop, Mouse1, 2999, 25
2Alicealice@mail.comKeyboard179

Problems:

  • Update anomaly — Alice's email is duplicated. Updating it in one row but not the other causes inconsistency.
  • Insert anomaly — You cannot add a new customer who hasn't placed an order yet.
  • Delete anomaly — Deleting order 2 deletes Alice's only record, losing her contact info entirely.

Normalization is the process of structuring a database to eliminate these anomalies by splitting tables and defining relationships.

Key Concepts

Functional Dependency

A functional dependency exists when the value of one attribute determines the value of another.

  • Notation: A → B means "A determines B"
  • Example: student_id → student_name (knowing the student ID uniquely determines the name)

Partial Dependency

A partial dependency occurs when a non-key attribute depends on only part of a composite primary key, not the whole key.

Transitive Dependency

A transitive dependency occurs when A → B and B → C, meaning A → C indirectly. The non-key attribute C depends on another non-key attribute B rather than directly on the primary key.


First Normal Form (1NF)

Rules

  1. Atomic values — each cell contains a single, indivisible value (no lists, arrays, or comma-separated values)
  2. No repeating groups — each column has a unique name; no duplicate columns like phone1, phone2
  3. Unique rows — each row is distinguishable (typically via a primary key)

Before 1NF (violations)

order_idcustomer_nameproducts
1AliceLaptop, Mouse, Keyboard
2BobMonitor

Violations: products contains multiple values in a single cell.

After 1NF

order_idcustomer_nameproduct
1AliceLaptop
1AliceMouse
1AliceKeyboard
2BobMonitor

Now every cell is atomic. But we introduced a new problem: customer_name repeats for every product in the same order — that's redundancy, addressed by 2NF.


Second Normal Form (2NF)

Rules

  1. Must be in 1NF
  2. No partial dependencies — every non-key attribute must depend on the entire primary key, not just part of it

2NF only applies to tables with composite primary keys. If your table has a single-column primary key, it is automatically in 2NF (as long as it's in 1NF).

Before 2NF (violations)

Consider a table tracking which student takes which course, with the instructor for each course:

student_idcourse_idstudent_namecourse_nameinstructor
1CS101AliceIntro to CSProf. Smith
1MATH200AliceCalculus IIProf. Jones
2CS101BobIntro to CSProf. Smith

Composite PK: (student_id, course_id)

Dependencies:

  • (student_id, course_id) → instructor ✅ full dependency
  • student_id → student_namepartial dependency — depends on only part of the key
  • course_id → course_name, instructorpartial dependency — depends on only part of the key

After 2NF — Split into three tables

enrollments (junction table):

student_idcourse_id
1CS101
1MATH200
2CS101

students:

student_idstudent_name
1Alice
2Bob

courses:

course_idcourse_nameinstructor
CS101Intro to CSProf. Smith
MATH200Calculus IIProf. Jones

Now every non-key attribute depends on the whole primary key of its table.


Third Normal Form (3NF)

Rules

  1. Must be in 2NF
  2. No transitive dependencies — non-key attributes must depend on the primary key directly, not through another non-key attribute

Before 3NF (violations)

emp_idemp_namedept_iddept_namedept_location
1AliceD01EngineeringFloor 3
2BobD01EngineeringFloor 3
3CarolD02MarketingFloor 1

Dependencies:

  • emp_id → emp_name, dept_id, dept_name, dept_location
  • dept_id → dept_name, dept_locationtransitive dependency
  • So: emp_id → dept_id → dept_name, dept_locationdept_name and dept_location depend on dept_id, not directly on emp_id

This causes: if Engineering moves to Floor 5, you must update every employee in that department — update anomaly.

After 3NF — Split into two tables

employees:

emp_idemp_namedept_id
1AliceD01
2BobD01
3CarolD02

departments:

dept_iddept_namedept_location
D01EngineeringFloor 3
D02MarketingFloor 1

Now dept_name and dept_location depend only on dept_id (the primary key of departments), not transitively through emp_id.


Summary: 1NF → 2NF → 3NF

Normal FormRequirementEliminates
1NFAtomic values, no repeating groupsMulti-valued cells, duplicate columns
2NF1NF + no partial dependenciesRedundancy from composite key subsets
3NF2NF + no transitive dependenciesRedundancy from non-key → non-key dependencies

Quick check for each form:

  • 1NF: Is every cell a single value? Are there no repeating groups?
  • 2NF: Does every non-key column depend on the whole primary key? (Only relevant for composite keys)
  • 3NF: Does every non-key column depend on the primary key directly, not through another non-key column?

Full Example: Step-by-Step Transformation

Unnormalized table

order_iddatecustomer_idcustomer_namecustomer_cityproduct_idproduct_namequantityunit_price
10012024-01-15C50AliceHanoiP01Laptop1999
10012024-01-15C50AliceHanoiP02Mouse225
10022024-01-16C51BobSaigonP01Laptop1999

After 1NF — already atomic ✅

Every cell is a single value. No repeating groups.

After 2NF — check partial dependencies

PK: (order_id, product_id) — a composite key.

  • order_id → date, customer_id, customer_name, customer_city ❌ partial — split out orders
  • product_id → product_name, unit_price ❌ partial — split out products
  • (order_id, product_id) → quantity ✅ full — stays in order_items

Result: orders, order_items, products

After 3NF — check transitive dependencies

In the orders table: customer_id → customer_name, customer_citycustomer_name and customer_city depend on customer_id, not directly on order_id. This is a transitive dependency.

Result: split orders into orders and customers.

Final normalized schema (3NF)

customers:

customer_idcustomer_namecustomer_city
C50AliceHanoi
C51BobSaigon

orders:

order_idorder_datecustomer_id
10012024-01-15C50
10022024-01-16C51

products:

product_idproduct_nameunit_price
P01Laptop999
P02Mouse25

order_items:

order_idproduct_idquantity
1001P011
1001P022
1002P011

When to Denormalize

Normalized schemas minimize redundancy but can increase JOIN complexity and query latency. In practice, denormalization is common for:

ScenarioWhat to do
Read-heavy workloadsDuplicate frequently accessed columns to avoid JOINs
Reporting / OLAPPre-aggregate data into summary tables or materialized views
High-latency JOINsEmbed related data (e.g., store customer_name in orders for fast display)
Distributed databasesCo-locate related data to minimize cross-node queries
tip

Normalize first, denormalize deliberately. Start with 3NF as your baseline. Only denormalize when you have a measurable performance problem and a clear benefit.

Common Pitfalls

  • Stopping at 1NF — multi-valued cells "work" initially but cause parsing nightmares and prevent indexing
  • Over-normalizing — splitting into too many tiny tables makes queries unreadably complex with excessive JOINs
  • Forgetting to index foreign keys — normalized schemas rely heavily on JOINs; missing indexes on foreign keys kills performance
  • Denormalizing too early — premature denormalization introduces the very anomalies normalization was designed to prevent
  • Ignoring business rules — normalization is a structural tool; complex business rules may still require application-level validation

Interview Questions

1. What is database normalization and why is it important?

Normalization is the process of organizing a database into tables and columns to minimize redundancy and eliminate data anomalies (insert, update, delete). It ensures each piece of data is stored in exactly one place, making the database easier to maintain and more consistent.

2. Explain 1NF, 2NF, and 3NF in simple terms.

Normal FormIn Simple Terms
1NFOne value per cell. No lists or comma-separated values in a column.
2NFEvery non-key column depends on the whole primary key. (No partial dependencies.)
3NFEvery non-key column depends only on the primary key. (No transitive dependencies.)

A common mnemonic: "Every non-key attribute must provide a fact about the key (1NF), the whole key (2NF), and nothing but the key (3NF)."

3. What is the difference between a partial dependency and a transitive dependency?

  • Partial dependency: a non-key attribute depends on part of a composite key (e.g., student_name depends only on student_id, not on (student_id, course_id)). Relevant only for composite primary keys.
  • Transitive dependency: a non-key attribute depends on another non-key attribute (e.g., dept_name depends on dept_id, which depends on emp_id). Can exist even with single-column primary keys.

4. Is a table with a single-column primary key automatically in 2NF?

Yes. 2NF requires eliminating partial dependencies — where a non-key column depends on only part of the primary key. If the primary key is a single column, there is no "part of it" to partially depend on, so no partial dependencies can exist.

5. When would you denormalize a database?

Denormalize when query performance is more important than storage efficiency:

  • Read-heavy systems where JOINs are a bottleneck
  • Reporting and analytics dashboards that need pre-aggregated data
  • CQRS architectures where read models are separate from write models
  • Distributed databases where cross-node JOINs are expensive

Always normalize first, then denormalize with a specific performance goal in mind.

6. What anomalies does normalization prevent?

AnomalyDescriptionExample
Insert anomalyCannot insert a record without unrelated dataCannot add a new department until an employee is assigned to it
Update anomalyUpdating one value requires updating multiple rowsChanging a customer's email requires updating every order row
Delete anomalyDeleting one record unintentionally removes other dataDeleting the last order for a customer removes the customer entirely

Learn More