Module P-4·24 min read

Normalisation, denormalisation trade-offs, correct data types for money and time, soft deletes, and schema migration tools.

P-4 — Schema Design for Real Applications

Who this module is for: You can write SQL and use constraints. Now you need to design schemas that hold up under real conditions: changing requirements, growing data, and teams who make mistakes. This module covers the principles and decisions that distinguish schemas built to last from schemas that cause pain six months into production.

Normalisation in Plain English

Normalisation is the process of organising tables to reduce data redundancy and improve integrity. Academics describe it in terms of normal forms (1NF, 2NF, 3NF). In practice, you need one intuitive rule:

Each fact should be stored in exactly one place.

If the same piece of information is stored in multiple rows or multiple tables, changing it requires updating multiple places — and when someone forgets one, your data becomes inconsistent.

The redundancy problem

sql
-- ❌ DENORMALISED: customer city repeated in every order
CREATE TABLE orders_bad (
  id            BIGSERIAL PRIMARY KEY,
  customer_name TEXT NOT NULL,
  customer_city TEXT NOT NULL,   -- repeated for every order
  product       TEXT NOT NULL,
  amount        NUMERIC NOT NULL
);

-- Alice changes cities — you must update EVERY row with her name
-- Miss one row and Alice now lives in two cities simultaneously

sql
-- ✅ NORMALISED: city stored once in the customer record
CREATE TABLE customers (
  id   BIGSERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  city TEXT NOT NULL
);

CREATE TABLE orders (
  id          BIGSERIAL PRIMARY KEY,
  customer_id BIGINT NOT NULL REFERENCES customers(id),
  product     TEXT NOT NULL,
  amount      NUMERIC NOT NULL
);
-- Alice's city is updated in one place — every order reflects it automatically

When to intentionally denormalise

Normalisation is the default. Denormalisation is a deliberate trade-off:

High read volume, low update frequency: a reporting table that aggregates data from many tables — it is faster to pre-compute and store than to join 8 tables on every read
Historical snapshots: an orders table should store the price at the time of purchase, not reference the current products.price — prices change, but the order price should not
Audit logs: copy the full state of a row when it changes — redundancy is intentional

sql
-- CORRECT: storing price at order time, not referencing current price
CREATE TABLE order_items (
  id          BIGSERIAL PRIMARY KEY,
  order_id    BIGINT NOT NULL REFERENCES orders(id),
  product_id  BIGINT NOT NULL REFERENCES products(id),
  quantity    INTEGER NOT NULL,
  unit_price  NUMERIC(10,2) NOT NULL  -- snapshot of price when ordered
  -- NOT a foreign key to products.price — that would reflect future prices
);

Choosing the Right Data Types

Money: `NUMERIC`, Not `FLOAT`

sql
-- ❌ WRONG: floating-point arithmetic loses precision
CREATE TABLE invoices_bad (total FLOAT);
INSERT INTO invoices_bad VALUES (0.1 + 0.2);
SELECT total FROM invoices_bad;
-- 0.30000000000000004  ← wrong! will cause rounding errors in financial totals

-- ✅ CORRECT: NUMERIC is exact
CREATE TABLE invoices (total NUMERIC(15, 2));
INSERT INTO invoices VALUES (0.1 + 0.2);
SELECT total FROM invoices;
-- 0.30  ← exact

-- Alternative: store amounts in the smallest unit as BIGINT
-- 100 = $1.00, 4999 = $49.99 — eliminates decimal entirely
CREATE TABLE orders (total_cents BIGINT NOT NULL);

Timestamps: `TIMESTAMPTZ`, Always

sql
-- ❌ WRONG: no timezone context
created_at TIMESTAMP   -- is this UTC? server local time? user local time?

-- ✅ CORRECT: timezone-aware
created_at TIMESTAMPTZ -- stored as UTC, displayed in session timezone

-- What goes wrong with bare TIMESTAMP:
-- Your server is in UTC. Your user is in Tokyo.
-- You store NOW() = '2026-05-17 10:00:00'
-- Your server moves to AWS us-east-1 (UTC-4 in winter)
-- The same value now reads as '2026-05-17 06:00:00' — 4 hours earlier
-- With TIMESTAMPTZ, the offset is stored — this never happens

Enumerations: `TEXT` with `CHECK` vs `ENUM` type

sql
-- Option 1: TEXT with CHECK constraint (recommended for most cases)
status TEXT NOT NULL CHECK (status IN ('pending', 'active', 'suspended', 'deleted'))
-- Adding new values: ALTER TABLE users DROP CONSTRAINT users_status_check;
--                    ALTER TABLE users ADD CONSTRAINT users_status_check
--                    CHECK (status IN ('pending', 'active', 'suspended', 'deleted', 'archived'));

-- Option 2: PostgreSQL ENUM type
CREATE TYPE user_status AS ENUM ('pending', 'active', 'suspended', 'deleted');
CREATE TABLE users (status user_status NOT NULL);
-- Adding new values: ALTER TYPE user_status ADD VALUE 'archived';
-- ⚠️ Cannot remove values from an ENUM without recreating it
-- ⚠️ ENUM type comparisons are case-sensitive and order-dependent

Recommendation: use TEXT NOT NULL CHECK (... IN ...) for flexibility. Use ENUM only when you need the type enforced at the type system level or need ordering (e.g., priority levels).

Boolean Columns

sql
-- ✅ CORRECT: explicit NOT NULL with a sensible default
is_published BOOLEAN NOT NULL DEFAULT false
is_verified  BOOLEAN NOT NULL DEFAULT false

-- ❌ PROBLEMATIC: nullable boolean has three states (true/false/unknown)
is_admin BOOLEAN  -- can be NULL, which might mean "we don't know"
-- Use NULL-able boolean only when "unknown" is genuinely meaningful

`UUID` vs `BIGSERIAL`

sql
-- BIGSERIAL: auto-incrementing, sequential, fast for indexes
id BIGSERIAL PRIMARY KEY
-- ✅ Sequential inserts are fast (B-tree inserts at the end)
-- ✅ Human-readable IDs in URLs and logs
-- ❌ IDs are guessable (security concern for some APIs)
-- ❌ Cannot generate IDs client-side

-- UUID v4: random, globally unique
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
-- ✅ Globally unique — safe across distributed systems
-- ✅ Not guessable — safe for public-facing IDs
-- ❌ Random UUIDs cause index fragmentation (inserts scatter across B-tree)
-- ❌ Larger size (16 bytes vs 8 bytes)

-- UUID v7: time-ordered, globally unique (best of both)
-- Available via the 'pg_uuidv7' extension or generated by application
id UUID PRIMARY KEY DEFAULT gen_random_uuid()  -- use UUIDv7 from application
-- ✅ Sequential like BIGSERIAL — no index fragmentation
-- ✅ Globally unique and unguessable
-- ✅ Encodes creation time

Rule of thumb: use BIGSERIAL for internal tables where sequential IDs are fine. Use UUID (v7 preferred) for public-facing IDs or distributed systems.

Standard Schema Patterns

Audit Fields — On Every Table

Every table in a production system should have these columns:

sql
CREATE TABLE products (
  id         BIGSERIAL    PRIMARY KEY,
  -- ... your columns ...
  created_at TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ  NOT NULL DEFAULT NOW()
);

updated_at does not update automatically — you must update it in your queries or use a trigger:

sql
-- Trigger to auto-update updated_at
CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER products_set_updated_at
BEFORE UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION set_updated_at();

-- Now every UPDATE automatically sets updated_at = NOW()
UPDATE products SET price = 119.99 WHERE id = 1;
-- updated_at is automatically set

Soft Deletes — The `deleted_at` Pattern

Hard deletion (DELETE) permanently removes data and breaks audit trails. Soft deletion marks rows as deleted without removing them.

sql
CREATE TABLE users (
  id         BIGSERIAL    PRIMARY KEY,
  email      TEXT         NOT NULL UNIQUE,
  -- ... other columns ...
  deleted_at TIMESTAMPTZ  -- NULL = active, non-NULL = soft-deleted
);

-- "Delete" a user (mark as deleted, not actually removed)
UPDATE users SET deleted_at = NOW() WHERE id = $user_id;

-- Query only active users (common — put this in all relevant queries)
SELECT * FROM users WHERE deleted_at IS NULL;

-- Or create a VIEW for convenience
CREATE VIEW active_users AS
  SELECT * FROM users WHERE deleted_at IS NULL;

The soft delete trade-off:

Benefit: data is preserved for audit, recovery, and analytics
Cost: every query must include WHERE deleted_at IS NULL — forgetting it returns deleted records
Cost: unique constraints become complicated (two users with same email — one deleted, one active)

sql
-- Handling UNIQUE constraint with soft deletes:
-- A simple UNIQUE on email blocks creating a new account for a deleted email

-- Solution 1: partial unique index (only enforce uniqueness for active users)
CREATE UNIQUE INDEX users_email_active_unique
ON users (email)
WHERE deleted_at IS NULL;
-- Two rows with same email allowed if at least one has deleted_at set

-- Solution 2: add a is_deleted boolean (less flexible)
-- Solution 3: move deleted records to a separate archive table

Multi-Tenancy — Isolating Customer Data

For SaaS applications serving multiple organisations:

sql
-- Every table has a tenant/organisation reference
CREATE TABLE organisations (
  id   BIGSERIAL PRIMARY KEY,
  name TEXT NOT NULL
);

CREATE TABLE projects (
  id       BIGSERIAL PRIMARY KEY,
  org_id   BIGINT NOT NULL REFERENCES organisations(id) ON DELETE CASCADE,
  name     TEXT NOT NULL
);

CREATE TABLE tasks (
  id         BIGSERIAL PRIMARY KEY,
  org_id     BIGINT NOT NULL REFERENCES organisations(id) ON DELETE CASCADE,
  project_id BIGINT NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
  title      TEXT NOT NULL
);

-- Every query filters by org_id — never show one org's data to another
SELECT * FROM tasks WHERE org_id = $current_org_id AND project_id = $project_id;

-- Index for multi-tenant performance
CREATE INDEX idx_tasks_org_project ON tasks (org_id, project_id);

Row-Level Security (covered in P-6) can enforce this automatically at the database level.

Schema Migration Tools

In Phase 1, we touched on migrations conceptually. Here is the practical toolkit for mid-level engineers.

What a migration file looks like

sql
-- migrations/20260517_001_add_task_labels.sql
-- Up migration (apply change)
CREATE TABLE labels (
  id      BIGSERIAL PRIMARY KEY,
  org_id  BIGINT NOT NULL REFERENCES organisations(id),
  name    TEXT   NOT NULL,
  color   TEXT   NOT NULL DEFAULT '#3B82F6',
  UNIQUE (org_id, name)
);

CREATE TABLE task_labels (
  task_id  BIGINT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
  label_id BIGINT NOT NULL REFERENCES labels(id) ON DELETE CASCADE,
  PRIMARY KEY (task_id, label_id)
);

CREATE INDEX idx_task_labels_task ON task_labels (task_id);
CREATE INDEX idx_task_labels_label ON task_labels (label_id);

`node-pg-migrate` (Node.js)

javascript
// migrations/20260517001-add-task-labels.js
exports.up = async (pgm) => {
  pgm.createTable('labels', {
    id: { type: 'bigserial', primaryKey: true },
    org_id: { type: 'bigint', notNull: true, references: 'organisations(id)' },
    name: { type: 'text', notNull: true },
    color: { type: 'text', notNull: true, default: "'#3B82F6'" },
  });
  pgm.addConstraint('labels', 'labels_org_name_unique', 'UNIQUE (org_id, name)');
};

exports.down = async (pgm) => {
  pgm.dropTable('labels');
};

bash
# Run pending migrations
DATABASE_URL=postgresql://... node-pg-migrate up

# Roll back last migration
DATABASE_URL=postgresql://... node-pg-migrate down

Prisma Migrate (Node.js / TypeScript)

prisma
// schema.prisma
model Task {
  id        BigInt   @id @default(autoincrement())
  orgId     BigInt
  projectId BigInt
  title     String
  status    String   @default("todo")
  createdAt DateTime @default(now())
  updatedAt DateTime @updatedAt

  org     Organisation @relation(fields: [orgId], references: [id])
  project Project      @relation(fields: [projectId], references: [id])
  labels  TaskLabel[]
}

bash
# Generate and apply a migration from schema changes
npx prisma migrate dev --name add-task-labels

# Apply migrations in production
npx prisma migrate deploy

Safe schema changes in production

Not all ALTER TABLE operations are instant. Know which are dangerous:

sql
-- ✅ Instant (no table rewrite):
ALTER TABLE tasks ADD COLUMN priority INTEGER DEFAULT 3;  -- PG 11+
ALTER TABLE tasks ADD COLUMN notes TEXT;
ALTER TABLE tasks ALTER COLUMN status SET DEFAULT 'todo';
CREATE INDEX CONCURRENTLY idx_tasks_status ON tasks (status);

-- ⚠️ Potentially slow (may lock or rewrite table):
ALTER TABLE tasks ALTER COLUMN price TYPE NUMERIC(15,2);  -- full rewrite
ALTER TABLE tasks ADD COLUMN required TEXT NOT NULL;       -- PG < 11: full rewrite
-- PG 11+: adding NOT NULL with a DEFAULT is instant

-- ✅ Safe pattern for adding a NOT NULL column to a live table:
-- Step 1: add nullable
ALTER TABLE tasks ADD COLUMN region TEXT;
-- Step 2: backfill (in batches for large tables)
UPDATE tasks SET region = 'us-east' WHERE region IS NULL;
-- Step 3: add NOT NULL constraint
ALTER TABLE tasks ALTER COLUMN region SET NOT NULL;

Practical Exercise: Redesign a Real Schema

Start with a poorly designed schema and improve it:

sql
-- ❌ The original bad schema (a real example of common mistakes)
CREATE TABLE user_orders (
  id           SERIAL,              -- should be BIGSERIAL
  user_email   TEXT,                -- missing NOT NULL and UNIQUE
  user_name    TEXT,                -- should reference a users table
  product_name TEXT,                -- should reference a products table
  price        FLOAT,               -- should be NUMERIC
  order_date   TIMESTAMP,           -- should be TIMESTAMPTZ
  is_deleted   BOOLEAN              -- missing NOT NULL DEFAULT false
);

Redesign it:

sql
-- ✅ Normalised, correctly typed schema
CREATE TABLE users (
  id         BIGSERIAL    PRIMARY KEY,
  name       TEXT         NOT NULL,
  email      TEXT         NOT NULL UNIQUE,
  created_at TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
  deleted_at TIMESTAMPTZ
);

CREATE TABLE products (
  id    BIGSERIAL       PRIMARY KEY,
  name  TEXT            NOT NULL,
  price NUMERIC(10, 2)  NOT NULL CHECK (price >= 0)
);

CREATE TABLE orders (
  id          BIGSERIAL     PRIMARY KEY,
  user_id     BIGINT        NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
  ordered_at  TIMESTAMPTZ   NOT NULL DEFAULT NOW(),
  deleted_at  TIMESTAMPTZ   -- soft delete
);

CREATE TABLE order_items (
  id         BIGSERIAL      PRIMARY KEY,
  order_id   BIGINT         NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
  product_id BIGINT         NOT NULL REFERENCES products(id) ON DELETE RESTRICT,
  quantity   INTEGER        NOT NULL CHECK (quantity > 0),
  unit_price NUMERIC(10,2)  NOT NULL  -- snapshot of price at order time
);

-- Indexes
CREATE INDEX idx_orders_user    ON orders (user_id);
CREATE INDEX idx_order_items_order ON order_items (order_id);
CREATE INDEX idx_order_items_product ON order_items (product_id);
CREATE UNIQUE INDEX users_email_active ON users (email) WHERE deleted_at IS NULL;

Summary

Decision	Recommended Choice	Why
Money	`NUMERIC(15,2)` or `BIGINT` (cents)	Exact arithmetic; FLOAT loses precision
Timestamps	`TIMESTAMPTZ`	Stores UTC; survives timezone changes
String enums	`TEXT NOT NULL CHECK (... IN ...)`	Easy to add values; no type migration needed
IDs	`BIGSERIAL` or UUID v7	BIGSERIAL for internal; UUID v7 for public-facing
Audit fields	`created_at`, `updated_at` on every table	Essential for debugging and data quality
Soft delete	`deleted_at TIMESTAMPTZ`	Preserves history; use partial unique indexes
NOT NULL new columns	Add nullable → backfill → add NOT NULL	Safe on live tables
`updated_at` auto-update	Trigger calling `SET NEW.updated_at = NOW()`	Consistent across all update paths
Normalisation	Default to normalised; denormalise only with data	Prevents inconsistent data

Module P-5 covers JSON and JSONB — working with semi-structured data in PostgreSQL, when it is the right tool, and when it is a schema design shortcut you will regret.

Next: P-5 — JSON and JSONB — Working With Semi-Structured Data →

PreviousModule P-3: Transactions and ACID in Practice Next Module P-5: JSON and JSONB — Working With Semi-Structured Data