Database Normalization (1NF, 2NF, 3NF)

What You'll Learn

By the end of this page, you will be able to:

Explain why normalization matters — what problems it solves
Define 1NF, 2NF, and 3NF with their exact requirements
Transform an unnormalized table step-by-step through each normal form
Identify functional dependencies and transitive dependencies
Know when to denormalize for performance

The Problem: Redundant & Inconsistent Data

Imagine a single table storing everything about orders:

order_id	customer_name	customer_email	product_names	quantities	prices
1	Alice	alice@mail.com	Laptop, Mouse	1, 2	999, 25
2	Alice	alice@mail.com	Keyboard	1	79

Problems:

Update anomaly — Alice's email is duplicated. Updating it in one row but not the other causes inconsistency.
Insert anomaly — You cannot add a new customer who hasn't placed an order yet.
Delete anomaly — Deleting order 2 deletes Alice's only record, losing her contact info entirely.

Normalization is the process of structuring a database to eliminate these anomalies by splitting tables and defining relationships.

Key Concepts

Functional Dependency

A functional dependency exists when the value of one attribute determines the value of another.

Notation: A → B means "A determines B"
Example: student_id → student_name (knowing the student ID uniquely determines the name)

Partial Dependency

A partial dependency occurs when a non-key attribute depends on only part of a composite primary key, not the whole key.

Transitive Dependency

A transitive dependency occurs when A → B and B → C, meaning A → C indirectly. The non-key attribute C depends on another non-key attribute B rather than directly on the primary key.

First Normal Form (1NF)

Rules

Atomic values — each cell contains a single, indivisible value (no lists, arrays, or comma-separated values)
No repeating groups — each column has a unique name; no duplicate columns like phone1, phone2
Unique rows — each row is distinguishable (typically via a primary key)

Before 1NF (violations)

order_id	customer_name	products
1	Alice	Laptop, Mouse, Keyboard
2	Bob	Monitor

Violations: products contains multiple values in a single cell.

After 1NF

order_id	customer_name	product
1	Alice	Laptop
1	Alice	Mouse
1	Alice	Keyboard
2	Bob	Monitor

Now every cell is atomic. But we introduced a new problem: customer_name repeats for every product in the same order — that's redundancy, addressed by 2NF.

Second Normal Form (2NF)

Rules

Must be in 1NF
No partial dependencies — every non-key attribute must depend on the entire primary key, not just part of it

2NF only applies to tables with composite primary keys. If your table has a single-column primary key, it is automatically in 2NF (as long as it's in 1NF).

Before 2NF (violations)

Consider a table tracking which student takes which course, with the instructor for each course:

student_id	course_id	student_name	course_name	instructor
1	CS101	Alice	Intro to CS	Prof. Smith
1	MATH200	Alice	Calculus II	Prof. Jones
2	CS101	Bob	Intro to CS	Prof. Smith

Composite PK: (student_id, course_id)

Dependencies:

(student_id, course_id) → instructor ✅ full dependency
student_id → student_name ❌ partial dependency — depends on only part of the key
course_id → course_name, instructor ❌ partial dependency — depends on only part of the key

After 2NF — Split into three tables

enrollments (junction table):

student_id	course_id
1	CS101
1	MATH200
2	CS101

students:

student_id	student_name
1	Alice
2	Bob

courses:

course_id	course_name	instructor
CS101	Intro to CS	Prof. Smith
MATH200	Calculus II	Prof. Jones

Now every non-key attribute depends on the whole primary key of its table.

Third Normal Form (3NF)

Rules

Must be in 2NF
No transitive dependencies — non-key attributes must depend on the primary key directly, not through another non-key attribute

Before 3NF (violations)

emp_id	emp_name	dept_id	dept_name	dept_location
1	Alice	D01	Engineering	Floor 3
2	Bob	D01	Engineering	Floor 3
3	Carol	D02	Marketing	Floor 1

Dependencies:

emp_id → emp_name, dept_id, dept_name, dept_location ✅
dept_id → dept_name, dept_location ❌ transitive dependency
So: emp_id → dept_id → dept_name, dept_location — dept_name and dept_location depend on dept_id, not directly on emp_id

This causes: if Engineering moves to Floor 5, you must update every employee in that department — update anomaly.

After 3NF — Split into two tables

employees:

emp_id	emp_name	dept_id
1	Alice	D01
2	Bob	D01
3	Carol	D02

departments:

dept_id	dept_name	dept_location
D01	Engineering	Floor 3
D02	Marketing	Floor 1

Now dept_name and dept_location depend only on dept_id (the primary key of departments), not transitively through emp_id.

Summary: 1NF → 2NF → 3NF

Normal Form	Requirement	Eliminates
1NF	Atomic values, no repeating groups	Multi-valued cells, duplicate columns
2NF	1NF + no partial dependencies	Redundancy from composite key subsets
3NF	2NF + no transitive dependencies	Redundancy from non-key → non-key dependencies

Quick check for each form:

1NF: Is every cell a single value? Are there no repeating groups?
2NF: Does every non-key column depend on the whole primary key? (Only relevant for composite keys)
3NF: Does every non-key column depend on the primary key directly, not through another non-key column?

Full Example: Step-by-Step Transformation

Unnormalized table

order_id	date	customer_id	customer_name	customer_city	product_id	product_name	quantity	unit_price
1001	2024-01-15	C50	Alice	Hanoi	P01	Laptop	1	999
1001	2024-01-15	C50	Alice	Hanoi	P02	Mouse	2	25
1002	2024-01-16	C51	Bob	Saigon	P01	Laptop	1	999

After 1NF — already atomic ✅

Every cell is a single value. No repeating groups.

After 2NF — check partial dependencies

PK: (order_id, product_id) — a composite key.

order_id → date, customer_id, customer_name, customer_city ❌ partial — split out orders
product_id → product_name, unit_price ❌ partial — split out products
(order_id, product_id) → quantity ✅ full — stays in order_items

Result: orders, order_items, products

After 3NF — check transitive dependencies

In the orders table: customer_id → customer_name, customer_city — customer_name and customer_city depend on customer_id, not directly on order_id. This is a transitive dependency.

Result: split orders into orders and customers.

Final normalized schema (3NF)

customers:

customer_id	customer_name	customer_city
C50	Alice	Hanoi
C51	Bob	Saigon

orders:

order_id	order_date	customer_id
1001	2024-01-15	C50
1002	2024-01-16	C51

products:

product_id	product_name	unit_price
P01	Laptop	999
P02	Mouse	25

order_items:

order_id	product_id	quantity
1001	P01	1
1001	P02	2
1002	P01	1

When to Denormalize

Normalized schemas minimize redundancy but can increase JOIN complexity and query latency. In practice, denormalization is common for:

Scenario	What to do
Read-heavy workloads	Duplicate frequently accessed columns to avoid JOINs
Reporting / OLAP	Pre-aggregate data into summary tables or materialized views
High-latency JOINs	Embed related data (e.g., store `customer_name` in `orders` for fast display)
Distributed databases	Co-locate related data to minimize cross-node queries

tip

Normalize first, denormalize deliberately. Start with 3NF as your baseline. Only denormalize when you have a measurable performance problem and a clear benefit.

Common Pitfalls

Stopping at 1NF — multi-valued cells "work" initially but cause parsing nightmares and prevent indexing
Over-normalizing — splitting into too many tiny tables makes queries unreadably complex with excessive JOINs
Forgetting to index foreign keys — normalized schemas rely heavily on JOINs; missing indexes on foreign keys kills performance
Denormalizing too early — premature denormalization introduces the very anomalies normalization was designed to prevent
Ignoring business rules — normalization is a structural tool; complex business rules may still require application-level validation

Interview Questions

1. What is database normalization and why is it important?

Normalization is the process of organizing a database into tables and columns to minimize redundancy and eliminate data anomalies (insert, update, delete). It ensures each piece of data is stored in exactly one place, making the database easier to maintain and more consistent.

2. Explain 1NF, 2NF, and 3NF in simple terms.

Normal Form	In Simple Terms
1NF	One value per cell. No lists or comma-separated values in a column.
2NF	Every non-key column depends on the whole primary key. (No partial dependencies.)
3NF	Every non-key column depends only on the primary key. (No transitive dependencies.)

A common mnemonic: "Every non-key attribute must provide a fact about the key (1NF), the whole key (2NF), and nothing but the key (3NF)."

3. What is the difference between a partial dependency and a transitive dependency?

Partial dependency: a non-key attribute depends on part of a composite key (e.g., student_name depends only on student_id, not on (student_id, course_id)). Relevant only for composite primary keys.
Transitive dependency: a non-key attribute depends on another non-key attribute (e.g., dept_name depends on dept_id, which depends on emp_id). Can exist even with single-column primary keys.

4. Is a table with a single-column primary key automatically in 2NF?

Yes. 2NF requires eliminating partial dependencies — where a non-key column depends on only part of the primary key. If the primary key is a single column, there is no "part of it" to partially depend on, so no partial dependencies can exist.

5. When would you denormalize a database?

Denormalize when query performance is more important than storage efficiency:

Read-heavy systems where JOINs are a bottleneck
Reporting and analytics dashboards that need pre-aggregated data
CQRS architectures where read models are separate from write models
Distributed databases where cross-node JOINs are expensive

Always normalize first, then denormalize with a specific performance goal in mind.

6. What anomalies does normalization prevent?

Anomaly	Description	Example
Insert anomaly	Cannot insert a record without unrelated data	Cannot add a new department until an employee is assigned to it
Update anomaly	Updating one value requires updating multiple rows	Changing a customer's email requires updating every order row
Delete anomaly	Deleting one record unintentionally removes other data	Deleting the last order for a customer removes the customer entirely

Learn More

Primary Key vs Foreign Key — how keys enforce relationships between normalized tables
ACID Properties — how transactions guarantee consistency in normalized databases
SQL Joins — querying across normalized tables
Database Indexing — speeding up JOINs between normalized tables

What You'll Learn​

The Problem: Redundant & Inconsistent Data​

Key Concepts​

Functional Dependency​

Partial Dependency​

Transitive Dependency​

First Normal Form (1NF)​

Rules​

Before 1NF (violations)​

After 1NF​

Second Normal Form (2NF)​

Rules​

Before 2NF (violations)​

After 2NF — Split into three tables​

Third Normal Form (3NF)​

Rules​

Before 3NF (violations)​

After 3NF — Split into two tables​

Summary: 1NF → 2NF → 3NF​

Full Example: Step-by-Step Transformation​

Unnormalized table​

After 1NF — already atomic ✅​

After 2NF — check partial dependencies​

After 3NF — check transitive dependencies​

Final normalized schema (3NF)​

When to Denormalize​

Common Pitfalls​

Interview Questions​

1. What is database normalization and why is it important?​

2. Explain 1NF, 2NF, and 3NF in simple terms.​

3. What is the difference between a partial dependency and a transitive dependency?​

4. Is a table with a single-column primary key automatically in 2NF?​

5. When would you denormalize a database?​

6. What anomalies does normalization prevent?​

Learn More​

What You'll Learn

The Problem: Redundant & Inconsistent Data

Key Concepts

Functional Dependency

Partial Dependency

Transitive Dependency

First Normal Form (1NF)

Rules

Before 1NF (violations)

After 1NF

Second Normal Form (2NF)

Rules

Before 2NF (violations)

After 2NF — Split into three tables

Third Normal Form (3NF)

Rules

Before 3NF (violations)

After 3NF — Split into two tables

Summary: 1NF → 2NF → 3NF

Full Example: Step-by-Step Transformation

Unnormalized table

After 1NF — already atomic ✅

After 2NF — check partial dependencies

After 3NF — check transitive dependencies

Final normalized schema (3NF)

When to Denormalize

Common Pitfalls

Interview Questions

1. What is database normalization and why is it important?

2. Explain 1NF, 2NF, and 3NF in simple terms.

3. What is the difference between a partial dependency and a transitive dependency?

4. Is a table with a single-column primary key automatically in 2NF?

5. When would you denormalize a database?

6. What anomalies does normalization prevent?

Learn More