Module 11·32 min read

A structured decision framework using storage engine mechanics, consistency models, and scaling limits.

Module 11 — SQL vs NoSQL: An Engineer's Framework, Not a Marketing Debate

What this module covers: The SQL vs NoSQL debate is usually noise. Marketing claims, hype cycles, and cargo-culted architecture decisions dominate the conversation. This module cuts through that. By this point in the course you understand Postgres's internals precisely — MVCC, WAL, the planner, indexing. This module uses the same precision to examine what MongoDB, Redis, Cassandra, and ClickHouse actually do differently at the storage and consistency layer, when those differences matter, and a five-question decision framework for choosing correctly.

The Question Nobody Asks

The usual framing: "Should I use SQL or NoSQL?"

The useful framing: "What are the consistency guarantees, storage mechanics, and scaling characteristics I need — and which database provides exactly that?"

Every database exists because it makes a specific set of trade-offs that some workload needs. MongoDB is not "better" than Postgres. Cassandra is not "more scalable." Redis is not "faster." Each makes different trade-offs, and the right choice depends entirely on which trade-offs your workload can accept.

This module does not give you a winner. It gives you the precision to make the right call for your specific situation.

Postgres vs MongoDB

What MongoDB Actually Is

MongoDB is a document database. Documents are stored as BSON (binary JSON) with flexible schemas — each document in a collection can have different fields. The storage engine (WiredTiger since MongoDB 3.2) uses B-tree structures similar to Postgres for indexes.

The marketing pitch: "flexible schema, horizontal scaling, JSON-native." The engineering reality is more nuanced.

Storage Engine Comparison

Postgres (heap-based, MVCC):

Rows stored in fixed-structure 8KB heap pages
MVCC via xmin/xmax — dead tuples accumulate, autovacuum reclaims them
WAL for durability and replication
Full ACID transactions across multiple tables

MongoDB (WiredTiger, document-level concurrency):

Documents stored in B-tree files, compressed by default (Snappy or zstd)
Concurrency at the document level: multiple writers can write to the same collection simultaneously as long as they touch different documents
Before WiredTiger: collection-level lock (global write lock in very old versions) — this is where "MongoDB doesn't scale for writes" came from, and it has not been true since 2015
Multi-document ACID transactions added in MongoDB 4.0 (2018) — but with higher overhead than Postgres single-row operations

The dead tuple problem: MongoDB's WiredTiger uses a different mechanism than MVCC but has its own version of space reclamation. Deleted and updated documents leave fragmented space that compact reclaims — analogous to VACUUM FULL in Postgres. MongoDB's background compaction is generally less aggressive than Postgres's autovacuum.

Schema Flexibility: When It Matters

MongoDB's flexible schema is a genuine advantage when:

You are still figuring out your data model — you can store documents without upfront schema definition and add fields later without migrations
Different entities in the same collection have genuinely different shapes — product catalog where electronics have voltage/wattage but clothing has size/material
You are ingesting external data you do not control and cannot normalize upfront

MongoDB's flexible schema is a trap when:

You use it to avoid thinking about schema design — you end up with inconsistently structured documents that require application-layer validation instead of database-enforced constraints
Your "documents" are actually relational data — orders, line items, users, payments — that you are embedding in nested documents to avoid joins, but then querying across the nesting
Your documents grow unbounded — MongoDB has a 16MB document size limit; documents with embedded arrays that grow over time hit this limit

Joins: The Real Comparison

Postgres has native, optimized joins. MongoDB has $lookup (aggregation pipeline join). The comparison:

javascript
// MongoDB: find all transactions with their block info
db.transactions.aggregate([
  { $match: { status: "confirmed" } },
  { $lookup: {
    from: "blocks",
    localField: "block_height",
    foreignField: "height",
    as: "block"
  }},
  { $unwind: "$block" }
])

sql
-- Postgres equivalent
SELECT t.*, b.validator, b.timestamp
FROM transactions t
JOIN blocks b ON b.height = t.block_height
WHERE t.status = 'confirmed';

MongoDB's $lookup is a nested loop join — for each transaction, it looks up the block. There is no hash join or merge join. For large collections, this is significantly slower than Postgres's cost-based join selection. MongoDB's query optimizer does not consider join strategies the way Postgres's planner does.

When MongoDB wins on data access: when your access pattern is almost entirely document-centric — fetch a single document by _id, update a single document — and joins are rare. A content management system where each article is a self-contained document with embedded metadata, tags, and author info is a legitimate MongoDB use case.

When Postgres wins: any workload with multi-table queries, aggregations across relationships, or constraints that must be enforced across documents.

Transactions

Pre-MongoDB 4.0: no multi-document transactions. This was the fundamental consistency gap.

Post-MongoDB 4.0: multi-document transactions exist but:

Higher overhead than single-document operations (write concern, distributed lock)
Not enabled by default for all operations (you must explicitly start a session and transaction)
Performance degrades significantly under high transaction concurrency compared to Postgres's MVCC

If your workload requires ACID transactions across multiple documents/tables: Postgres is the correct choice. MongoDB's transactions work, but they are a retrofit onto a system not designed for them.

Postgres vs Redis

What Redis Actually Is

Redis is an in-memory data structure store. It keeps its entire working dataset in RAM. Persistence is optional and asynchronous by default (RDB snapshots or AOF append-only file). It supports strings, hashes, lists, sets, sorted sets, streams, and more as first-class data types.

The marketing pitch: "blazing fast, sub-millisecond latency." This is true, and it is true because Redis does not write to disk on every operation by default.

The Durability Trade-off

Redis's default configuration (save 900 1, save 300 10, save 60 10000):

Saves a snapshot to disk every 15 minutes if at least 1 key changed
Saves every 5 minutes if at least 10 keys changed
Saves every minute if at least 10,000 keys changed

If Redis crashes between saves, you lose everything since the last save. This is an explicit design choice — Redis trades durability for speed.

With appendonly yes and appendfsync always, Redis syncs every write to the AOF file — similar to Postgres's synchronous_commit = on. Latency increases to 1–5ms per write (same as Postgres). The throughput advantage largely disappears.

Redis default (RDB only):         ~100,000 ops/sec, up to 15min data loss on crash
Redis (AOF, appendfsync=always):  ~50,000 ops/sec, durable
Postgres (synchronous_commit=on): ~10,000 TPS, durable, full ACID

Redis is faster than Postgres for the same durability level — but not infinitely faster once durability requirements are equal.

When Redis Is the Right Choice

Caching: Redis's primary legitimate use case. Store the results of expensive Postgres queries, pre-computed aggregations, or session data with a TTL. The in-memory nature and LRU eviction make it ideal.

Rate limiting: Redis's atomic INCR and EXPIRE commands make rate limiting straightforward and fast. Doing this in Postgres requires a table with a FOR UPDATE lock or careful use of advisory locks.

Pub/Sub and queues: Redis Streams and Pub/Sub are purpose-built for message passing. Postgres's LISTEN/NOTIFY exists but is not designed for high-throughput message buses.

Leaderboards and sorted sets: Redis's ZADD/ZRANGE for sorted sets are O(log N) and trivially fast for real-time rankings. Doing this in Postgres requires a table with an ordered index and careful concurrency control.

Session storage: Storing session data (user_id, permissions, cart) in Redis with a TTL is operationally simpler than Postgres for ephemeral, high-frequency reads.

When Redis Is Not the Answer

Source of truth for business data: if losing the last 15 minutes of writes is not acceptable, Redis default config is wrong. And if you configure Redis for full durability, Postgres is simpler and provides ACID guarantees Redis does not.

Complex queries: Redis has no query language beyond key lookups and data structure operations. If you need to aggregate, filter, or join data, you need Postgres.

Large datasets that don't fit in RAM: Redis is memory-bound. A 1TB dataset costs 1TB of RAM. Postgres uses disk efficiently with a much smaller memory footprint for the same dataset.

The common mistake: using Redis as a primary database because it is "faster," then building referential integrity, query logic, and transaction coordination in the application — reimplementing a worse version of what Postgres already provides.

Postgres vs Cassandra

What Cassandra Actually Is

Apache Cassandra is a wide-column store designed for write-heavy, geographically distributed workloads. It is a leaderless distributed database — every node can accept reads and writes, with replication handled via consistent hashing and configurable consistency levels.

The Write Path

Cassandra's write path is fundamentally different from Postgres:

Write is appended to an in-memory MemTable (not flushed to disk immediately)
Write is appended to a CommitLog on disk (for durability — analogous to WAL)
When MemTable fills up, it is flushed to disk as an immutable SSTable (Sorted String Table)
Background compaction merges SSTables, removing deleted/overwritten data

This is a Log-Structured Merge Tree (LSM Tree) — writes are always appends. There is no in-place update. An "update" to a row appends a new version; a "delete" appends a tombstone marker. Compaction eventually removes old versions and tombstones.

Write amplification: low (sequential append). Read amplification: higher (must check multiple SSTables for the current version of a row).

Consistency Model

Cassandra's consistency is configurable per-operation:

ONE:    Write/read acknowledged by 1 replica
QUORUM: Write/read acknowledged by majority of replicas
ALL:    Write/read acknowledged by all replicas

With replication factor 3 and QUORUM for both writes and reads:

Write: 2 of 3 replicas must acknowledge
Read: 2 of 3 replicas must respond, return latest version
Guarantee: any read after a write will see that write (strong consistency)

With ONE for writes and ONE for reads:

Eventual consistency — reads may return stale data
Higher availability and lower latency

Cassandra does not have multi-row transactions. Each write to a partition is atomic, but there is no cross-partition transaction. The application is responsible for handling partial failures.

When Cassandra Is the Right Choice

Extreme write throughput, globally distributed: Cassandra is genuinely better than Postgres for workloads that need to sustain millions of writes per second across multiple data centers. The LSM tree append-only write path is faster than Postgres's heap+WAL write path at extreme scale.

Time-series data with predictable access patterns: a table like (device_id, timestamp, metric_value) partitioned by device_id — writes always append to the partition, reads are time-range queries on a single partition. Cassandra handles this excellently.

Multi-region active-active: Cassandra's leaderless replication allows writes from any region without a primary-standby failover. Postgres requires a primary (single write endpoint) with standbys.

When Cassandra Is Not the Answer

Complex queries: Cassandra's query language (CQL) looks like SQL but is not. You cannot do arbitrary joins, aggregations across partitions, or ad-hoc filtering on non-partition-key columns without full table scans. Data modeling must be query-driven — design your tables around the queries you will run.

Strong consistency requirements across multiple entities: no cross-partition transactions. Financial operations requiring atomic multi-row updates cannot be done in Cassandra without application-level coordination.

Moderate write throughput with complex reads: if you are writing thousands per second but need flexible querying, Postgres with proper indexing and partitioning handles this better.

The common mistake: choosing Cassandra because "it scales to petabytes" when your dataset is 100GB and your queries require joins. You end up maintaining denormalized tables, implementing join logic in the application, and debugging eventual consistency bugs.

Postgres vs ClickHouse / DuckDB

OLAP vs OLTP: The Actual Difference

OLTP (Online Transaction Processing): many small, targeted operations. SELECT * FROM transactions WHERE id = 12345. Row-oriented storage is optimal — fetch a whole row cheaply.

OLAP (Online Analytical Processing): few large aggregations. SELECT date_trunc('day', timestamp), sum(amount) FROM transactions GROUP BY 1. Column-oriented storage is optimal — fetch only the timestamp and amount columns, skip everything else.

Postgres is row-oriented. A query reading only 2 of 20 columns still reads entire 8KB pages containing all 20 columns. For aggregations over billions of rows, this is I/O-wasteful.

ClickHouse and DuckDB are column-oriented. They store each column separately. An aggregation over timestamp and amount reads only those two columns from disk — a 10x I/O reduction on a 20-column table.

ClickHouse

ClickHouse is a column-oriented database designed for real-time analytics at high ingest rates. It uses a MergeTree storage engine (LSM-like for writes, columnar for reads).

Strengths:

Aggregate queries over billions of rows in seconds
Compressed columnar storage (10–50x compression typical)
High ingest throughput (millions of rows/second)
SQL with analytics extensions (window functions, materialized views, sampling)

Weaknesses:

No multi-row transactions
Updates and deletes are expensive (async mutations, not real-time)
Joins are less efficient than Postgres for OLTP patterns
Not designed for point lookups (WHERE id = 12345 is slower than Postgres)

When ClickHouse wins: analytics dashboards querying hundreds of millions of rows, real-time aggregation of event streams, reporting on large time-series datasets.

DuckDB

DuckDB is an in-process analytical database — it runs inside your application process, like SQLite but for analytics. It reads Parquet, CSV, JSON natively.

python
import duckdb
# Query a Parquet file directly — no server, no loading
result = duckdb.query("""
  SELECT date_trunc('day', timestamp), sum(amount)
  FROM 'transactions_2026.parquet'
  GROUP BY 1
  ORDER BY 1
""").fetchdf()

When DuckDB wins: ad-hoc analytics on files, notebook-style data exploration, ETL transformations, replacing pandas for large-file aggregations. Not a production database server.

The HTAP Problem

Many production systems need both OLTP (transactional writes, point lookups) and OLAP (aggregate analytics). The standard patterns:

Pattern 1: ETL pipeline — Postgres for OLTP, nightly ETL to ClickHouse for analytics. Simple, reliable, 24-hour data latency.

Pattern 2: CDC streaming — Postgres for OLTP, Debezium streams changes to ClickHouse in near-real-time. Sub-minute analytics latency, more infrastructure.

Pattern 3: Postgres + Timescale/Citus — Postgres extensions that add columnar storage or distributed query execution. One system, but more operational complexity.

Pattern 4: pg_analytics / paradedb — Postgres extensions that add analytical query capabilities. Emerging, not yet battle-tested at scale.

The Decision Framework: Five Questions

Before choosing a database, answer these five questions honestly.

1. What are your consistency requirements?

Strong ACID across multiple rows/tables: Postgres, MySQL, CockroachDB
Document-level atomicity only: MongoDB (fine for most document operations)
Eventual consistency acceptable: Cassandra, DynamoDB
No durability required (cache only): Redis

2. What is your read/write ratio and pattern?

Many small reads/writes, point lookups: Postgres (OLTP)
Write-heavy, append-mostly: Cassandra (LSM tree), ClickHouse (MergeTree)
Read-heavy aggregations over large datasets: ClickHouse, DuckDB, BigQuery
Mixed OLTP + OLAP: Postgres + ClickHouse via CDC, or Timescale

3. What are your query patterns? Do you know them upfront?

Complex, ad-hoc SQL: Postgres — the planner handles arbitrary query shapes
Predictable, pre-defined access patterns: Cassandra (model your tables around your queries)
Key-value lookups only: Redis, DynamoDB
Full-text or vector search primary: Elasticsearch, pgvector (Postgres extension)

4. What is your scale? Be honest.

Most applications will never exceed what a single well-tuned Postgres instance handles:

Single Postgres instance: up to ~10TB, ~50,000 TPS on NVMe
Postgres + read replicas: same write capacity, distributed read capacity
Citus (distributed Postgres): horizontal write scaling, SQL compatibility maintained
Cassandra: justified above ~50,000 writes/sec or multi-region active-active
ClickHouse: justified above ~10B rows in the analytics dataset

Choosing Cassandra for a 10GB dataset that gets 100 writes/second is not "planning for scale" — it is adding complexity without benefit. Postgres handles that with one server.

5. What operational complexity can you sustain?

Postgres: well-understood, mature tooling (Patroni, pgBackRest, PgBouncer), abundant expertise
MongoDB: similar maturity, Atlas managed service reduces ops burden
Cassandra: operationally heavy — compaction tuning, repair operations, token rebalancing on node add/remove
ClickHouse: simpler than Cassandra for read-heavy workloads, but mutations and schema changes are operationally different
Redis: simple for cache, complex for primary storage with persistence

The Common Mistake: NoSQL to Avoid Schema Design

The most frequent misuse of NoSQL databases:

"We chose MongoDB because we didn't know our schema yet, and we wanted flexibility."

Six months later:

Documents have inconsistent field names (user_id, userId, UserID)
Required fields are sometimes missing — no NOT NULL enforcement
Queries require $exists checks and $ifNull coalesces that would be unnecessary with a schema
The "flexible schema" has become an inconsistent schema that the application must validate
Indexes were added reactively, not by design
The team is implementing joins in the application layer with multiple round trips

This pattern reinvents a worse version of what Postgres provides — schema enforcement, constraints, referential integrity — at the application layer, where it is harder to maintain and easier to break.

Schema design is hard. It requires thought. NoSQL does not eliminate the need for schema design — it only moves the enforcement from the database to the application.

Summary

Database	Storage Model	Consistency	Best For	Avoid When
Postgres	Row-oriented heap, MVCC	Full ACID	OLTP, complex queries, relational data	Extreme write throughput (>50K TPS), pure OLAP
MongoDB	Document (BSON), WiredTiger	Document-level atomic; multi-doc transactions available	Flexible-schema documents, content management	Multi-entity transactions, complex aggregation
Redis	In-memory data structures	Eventually consistent (default); configurable	Caching, rate limiting, pub/sub, sorted sets	Source of truth, datasets exceeding RAM
Cassandra	LSM tree (SSTable), leaderless	Tunable (ONE to ALL)	Multi-region active-active, extreme write throughput	Complex queries, cross-partition transactions
ClickHouse	Columnar (MergeTree)	Eventual (for mutations)	Aggregate analytics over billions of rows	Point lookups, real-time updates, OLTP
DuckDB	Columnar, in-process	ACID (single-process)	Ad-hoc analytics, file querying, ETL	Production server, concurrent writes

The framework:

Consistency requirements → eliminates eventual-consistency-only options if you need ACID
Read/write pattern → eliminates columnar stores for OLTP, row stores for pure OLAP
Query flexibility → eliminates key-value and wide-column stores if you need ad-hoc SQL
Honest scale assessment → eliminates complex distributed systems if Postgres handles your load
Operational capacity → eliminates systems your team cannot run reliably

When in doubt: start with Postgres. You can always add a specialized store for a specific workload later. Postgres's operational maturity, tooling ecosystem, and feature set mean you rarely need to move away from it until you have very specific, proven scale or workload requirements that it cannot meet.

Module 12 closes the course with the operational side: the monitoring stack, the runbooks, the migration playbooks, and the habits that separate engineers who survive database incidents from those who create them.

Next: Module 12 — Production Operations: Monitoring, Migration, and the Runbooks That Matter →

PreviousModule 10: Replication, High Availability, and the CAP Trade-offs Next Module 12: Production Operations: Monitoring, Migration, and the Runbooks That Matter