Back/Module P-4 Schema Design for Real Applications
Module P-4·24 min read

Normalisation, denormalisation trade-offs, correct data types for money and time, soft deletes, and schema migration tools.

P-4 — Schema Design for Real Applications

Who this module is for: You can write SQL and use constraints. Now you need to design schemas that hold up under real conditions: changing requirements, growing data, and teams who make mistakes. This module covers the principles and decisions that distinguish schemas built to last from schemas that cause pain six months into production.


Normalisation in Plain English

Normalisation is the process of organising tables to reduce data redundancy and improve integrity. Academics describe it in terms of normal forms (1NF, 2NF, 3NF). In practice, you need one intuitive rule:

Each fact should be stored in exactly one place.

If the same piece of information is stored in multiple rows or multiple tables, changing it requires updating multiple places — and when someone forgets one, your data becomes inconsistent.

The redundancy problem

sql
-- ❌ DENORMALISED: customer city repeated in every order CREATE TABLE orders_bad ( id BIGSERIAL PRIMARY KEY, customer_name TEXT NOT NULL, customer_city TEXT NOT NULL, -- repeated for every order product TEXT NOT NULL, amount NUMERIC NOT NULL ); -- Alice changes cities — you must update EVERY row with her name -- Miss one row and Alice now lives in two cities simultaneously
sql
-- ✅ NORMALISED: city stored once in the customer record CREATE TABLE customers ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL, city TEXT NOT NULL ); CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, customer_id BIGINT NOT NULL REFERENCES customers(id), product TEXT NOT NULL, amount NUMERIC NOT NULL ); -- Alice's city is updated in one place — every order reflects it automatically

When to intentionally denormalise

Normalisation is the default. Denormalisation is a deliberate trade-off:

  • High read volume, low update frequency: a reporting table that aggregates data from many tables — it is faster to pre-compute and store than to join 8 tables on every read
  • Historical snapshots: an orders table should store the price at the time of purchase, not reference the current products.price — prices change, but the order price should not
  • Audit logs: copy the full state of a row when it changes — redundancy is intentional
sql
-- CORRECT: storing price at order time, not referencing current price CREATE TABLE order_items ( id BIGSERIAL PRIMARY KEY, order_id BIGINT NOT NULL REFERENCES orders(id), product_id BIGINT NOT NULL REFERENCES products(id), quantity INTEGER NOT NULL, unit_price NUMERIC(10,2) NOT NULL -- snapshot of price when ordered -- NOT a foreign key to products.price — that would reflect future prices );

Choosing the Right Data Types

Money: NUMERIC, Not FLOAT

sql
-- ❌ WRONG: floating-point arithmetic loses precision CREATE TABLE invoices_bad (total FLOAT); INSERT INTO invoices_bad VALUES (0.1 + 0.2); SELECT total FROM invoices_bad; -- 0.30000000000000004 ← wrong! will cause rounding errors in financial totals -- ✅ CORRECT: NUMERIC is exact CREATE TABLE invoices (total NUMERIC(15, 2)); INSERT INTO invoices VALUES (0.1 + 0.2); SELECT total FROM invoices; -- 0.30 ← exact -- Alternative: store amounts in the smallest unit as BIGINT -- 100 = $1.00, 4999 = $49.99 — eliminates decimal entirely CREATE TABLE orders (total_cents BIGINT NOT NULL);

Timestamps: TIMESTAMPTZ, Always

sql
-- ❌ WRONG: no timezone context created_at TIMESTAMP -- is this UTC? server local time? user local time? -- ✅ CORRECT: timezone-aware created_at TIMESTAMPTZ -- stored as UTC, displayed in session timezone -- What goes wrong with bare TIMESTAMP: -- Your server is in UTC. Your user is in Tokyo. -- You store NOW() = '2026-05-17 10:00:00' -- Your server moves to AWS us-east-1 (UTC-4 in winter) -- The same value now reads as '2026-05-17 06:00:00' — 4 hours earlier -- With TIMESTAMPTZ, the offset is stored — this never happens

Enumerations: TEXT with CHECK vs ENUM type

sql
-- Option 1: TEXT with CHECK constraint (recommended for most cases) status TEXT NOT NULL CHECK (status IN ('pending', 'active', 'suspended', 'deleted')) -- Adding new values: ALTER TABLE users DROP CONSTRAINT users_status_check; -- ALTER TABLE users ADD CONSTRAINT users_status_check -- CHECK (status IN ('pending', 'active', 'suspended', 'deleted', 'archived')); -- Option 2: PostgreSQL ENUM type CREATE TYPE user_status AS ENUM ('pending', 'active', 'suspended', 'deleted'); CREATE TABLE users (status user_status NOT NULL); -- Adding new values: ALTER TYPE user_status ADD VALUE 'archived'; -- ⚠️ Cannot remove values from an ENUM without recreating it -- ⚠️ ENUM type comparisons are case-sensitive and order-dependent

Recommendation: use TEXT NOT NULL CHECK (... IN ...) for flexibility. Use ENUM only when you need the type enforced at the type system level or need ordering (e.g., priority levels).

Boolean Columns

sql
-- ✅ CORRECT: explicit NOT NULL with a sensible default is_published BOOLEAN NOT NULL DEFAULT false is_verified BOOLEAN NOT NULL DEFAULT false -- ❌ PROBLEMATIC: nullable boolean has three states (true/false/unknown) is_admin BOOLEAN -- can be NULL, which might mean "we don't know" -- Use NULL-able boolean only when "unknown" is genuinely meaningful

UUID vs BIGSERIAL

sql
-- BIGSERIAL: auto-incrementing, sequential, fast for indexes id BIGSERIAL PRIMARY KEY -- ✅ Sequential inserts are fast (B-tree inserts at the end) -- ✅ Human-readable IDs in URLs and logs -- ❌ IDs are guessable (security concern for some APIs) -- ❌ Cannot generate IDs client-side -- UUID v4: random, globally unique id UUID PRIMARY KEY DEFAULT gen_random_uuid() -- ✅ Globally unique — safe across distributed systems -- ✅ Not guessable — safe for public-facing IDs -- ❌ Random UUIDs cause index fragmentation (inserts scatter across B-tree) -- ❌ Larger size (16 bytes vs 8 bytes) -- UUID v7: time-ordered, globally unique (best of both) -- Available via the 'pg_uuidv7' extension or generated by application id UUID PRIMARY KEY DEFAULT gen_random_uuid() -- use UUIDv7 from application -- ✅ Sequential like BIGSERIAL — no index fragmentation -- ✅ Globally unique and unguessable -- ✅ Encodes creation time

Rule of thumb: use BIGSERIAL for internal tables where sequential IDs are fine. Use UUID (v7 preferred) for public-facing IDs or distributed systems.


Standard Schema Patterns

Audit Fields — On Every Table

Every table in a production system should have these columns:

sql
CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, -- ... your columns ... created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() );

updated_at does not update automatically — you must update it in your queries or use a trigger:

sql
-- Trigger to auto-update updated_at CREATE OR REPLACE FUNCTION set_updated_at() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = NOW(); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER products_set_updated_at BEFORE UPDATE ON products FOR EACH ROW EXECUTE FUNCTION set_updated_at(); -- Now every UPDATE automatically sets updated_at = NOW() UPDATE products SET price = 119.99 WHERE id = 1; -- updated_at is automatically set

Soft Deletes — The deleted_at Pattern

Hard deletion (DELETE) permanently removes data and breaks audit trails. Soft deletion marks rows as deleted without removing them.

sql
CREATE TABLE users ( id BIGSERIAL PRIMARY KEY, email TEXT NOT NULL UNIQUE, -- ... other columns ... deleted_at TIMESTAMPTZ -- NULL = active, non-NULL = soft-deleted ); -- "Delete" a user (mark as deleted, not actually removed) UPDATE users SET deleted_at = NOW() WHERE id = $user_id; -- Query only active users (common — put this in all relevant queries) SELECT * FROM users WHERE deleted_at IS NULL; -- Or create a VIEW for convenience CREATE VIEW active_users AS SELECT * FROM users WHERE deleted_at IS NULL;

The soft delete trade-off:

  • Benefit: data is preserved for audit, recovery, and analytics
  • Cost: every query must include WHERE deleted_at IS NULL — forgetting it returns deleted records
  • Cost: unique constraints become complicated (two users with same email — one deleted, one active)
sql
-- Handling UNIQUE constraint with soft deletes: -- A simple UNIQUE on email blocks creating a new account for a deleted email -- Solution 1: partial unique index (only enforce uniqueness for active users) CREATE UNIQUE INDEX users_email_active_unique ON users (email) WHERE deleted_at IS NULL; -- Two rows with same email allowed if at least one has deleted_at set -- Solution 2: add a is_deleted boolean (less flexible) -- Solution 3: move deleted records to a separate archive table

Multi-Tenancy — Isolating Customer Data

For SaaS applications serving multiple organisations:

sql
-- Every table has a tenant/organisation reference CREATE TABLE organisations ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL ); CREATE TABLE projects ( id BIGSERIAL PRIMARY KEY, org_id BIGINT NOT NULL REFERENCES organisations(id) ON DELETE CASCADE, name TEXT NOT NULL ); CREATE TABLE tasks ( id BIGSERIAL PRIMARY KEY, org_id BIGINT NOT NULL REFERENCES organisations(id) ON DELETE CASCADE, project_id BIGINT NOT NULL REFERENCES projects(id) ON DELETE CASCADE, title TEXT NOT NULL ); -- Every query filters by org_id — never show one org's data to another SELECT * FROM tasks WHERE org_id = $current_org_id AND project_id = $project_id; -- Index for multi-tenant performance CREATE INDEX idx_tasks_org_project ON tasks (org_id, project_id);

Row-Level Security (covered in P-6) can enforce this automatically at the database level.


Schema Migration Tools

In Phase 1, we touched on migrations conceptually. Here is the practical toolkit for mid-level engineers.

What a migration file looks like

sql
-- migrations/20260517_001_add_task_labels.sql -- Up migration (apply change) CREATE TABLE labels ( id BIGSERIAL PRIMARY KEY, org_id BIGINT NOT NULL REFERENCES organisations(id), name TEXT NOT NULL, color TEXT NOT NULL DEFAULT '#3B82F6', UNIQUE (org_id, name) ); CREATE TABLE task_labels ( task_id BIGINT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE, label_id BIGINT NOT NULL REFERENCES labels(id) ON DELETE CASCADE, PRIMARY KEY (task_id, label_id) ); CREATE INDEX idx_task_labels_task ON task_labels (task_id); CREATE INDEX idx_task_labels_label ON task_labels (label_id);

node-pg-migrate (Node.js)

javascript
// migrations/20260517001-add-task-labels.js exports.up = async (pgm) => { pgm.createTable('labels', { id: { type: 'bigserial', primaryKey: true }, org_id: { type: 'bigint', notNull: true, references: 'organisations(id)' }, name: { type: 'text', notNull: true }, color: { type: 'text', notNull: true, default: "'#3B82F6'" }, }); pgm.addConstraint('labels', 'labels_org_name_unique', 'UNIQUE (org_id, name)'); }; exports.down = async (pgm) => { pgm.dropTable('labels'); };
bash
# Run pending migrations DATABASE_URL=postgresql://... node-pg-migrate up # Roll back last migration DATABASE_URL=postgresql://... node-pg-migrate down

Prisma Migrate (Node.js / TypeScript)

prisma
// schema.prisma model Task { id BigInt @id @default(autoincrement()) orgId BigInt projectId BigInt title String status String @default("todo") createdAt DateTime @default(now()) updatedAt DateTime @updatedAt org Organisation @relation(fields: [orgId], references: [id]) project Project @relation(fields: [projectId], references: [id]) labels TaskLabel[] }
bash
# Generate and apply a migration from schema changes npx prisma migrate dev --name add-task-labels # Apply migrations in production npx prisma migrate deploy

Safe schema changes in production

Not all ALTER TABLE operations are instant. Know which are dangerous:

sql
-- ✅ Instant (no table rewrite): ALTER TABLE tasks ADD COLUMN priority INTEGER DEFAULT 3; -- PG 11+ ALTER TABLE tasks ADD COLUMN notes TEXT; ALTER TABLE tasks ALTER COLUMN status SET DEFAULT 'todo'; CREATE INDEX CONCURRENTLY idx_tasks_status ON tasks (status); -- ⚠️ Potentially slow (may lock or rewrite table): ALTER TABLE tasks ALTER COLUMN price TYPE NUMERIC(15,2); -- full rewrite ALTER TABLE tasks ADD COLUMN required TEXT NOT NULL; -- PG < 11: full rewrite -- PG 11+: adding NOT NULL with a DEFAULT is instant -- ✅ Safe pattern for adding a NOT NULL column to a live table: -- Step 1: add nullable ALTER TABLE tasks ADD COLUMN region TEXT; -- Step 2: backfill (in batches for large tables) UPDATE tasks SET region = 'us-east' WHERE region IS NULL; -- Step 3: add NOT NULL constraint ALTER TABLE tasks ALTER COLUMN region SET NOT NULL;

Practical Exercise: Redesign a Real Schema

Start with a poorly designed schema and improve it:

sql
-- ❌ The original bad schema (a real example of common mistakes) CREATE TABLE user_orders ( id SERIAL, -- should be BIGSERIAL user_email TEXT, -- missing NOT NULL and UNIQUE user_name TEXT, -- should reference a users table product_name TEXT, -- should reference a products table price FLOAT, -- should be NUMERIC order_date TIMESTAMP, -- should be TIMESTAMPTZ is_deleted BOOLEAN -- missing NOT NULL DEFAULT false );

Redesign it:

sql
-- ✅ Normalised, correctly typed schema CREATE TABLE users ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL, email TEXT NOT NULL UNIQUE, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), deleted_at TIMESTAMPTZ ); CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL, price NUMERIC(10, 2) NOT NULL CHECK (price >= 0) ); CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT, ordered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), deleted_at TIMESTAMPTZ -- soft delete ); CREATE TABLE order_items ( id BIGSERIAL PRIMARY KEY, order_id BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE, product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE RESTRICT, quantity INTEGER NOT NULL CHECK (quantity > 0), unit_price NUMERIC(10,2) NOT NULL -- snapshot of price at order time ); -- Indexes CREATE INDEX idx_orders_user ON orders (user_id); CREATE INDEX idx_order_items_order ON order_items (order_id); CREATE INDEX idx_order_items_product ON order_items (product_id); CREATE UNIQUE INDEX users_email_active ON users (email) WHERE deleted_at IS NULL;

Summary

DecisionRecommended ChoiceWhy
MoneyNUMERIC(15,2) or BIGINT (cents)Exact arithmetic; FLOAT loses precision
TimestampsTIMESTAMPTZStores UTC; survives timezone changes
String enumsTEXT NOT NULL CHECK (... IN ...)Easy to add values; no type migration needed
IDsBIGSERIAL or UUID v7BIGSERIAL for internal; UUID v7 for public-facing
Audit fieldscreated_at, updated_at on every tableEssential for debugging and data quality
Soft deletedeleted_at TIMESTAMPTZPreserves history; use partial unique indexes
NOT NULL new columnsAdd nullable → backfill → add NOT NULLSafe on live tables
updated_at auto-updateTrigger calling SET NEW.updated_at = NOW()Consistent across all update paths
NormalisationDefault to normalised; denormalise only with dataPrevents inconsistent data

Module P-5 covers JSON and JSONB — working with semi-structured data in PostgreSQL, when it is the right tool, and when it is a schema design shortcut you will regret.

Next: P-5 — JSON and JSONB — Working With Semi-Structured Data →

Discussion