Module P-3·16 min read

The decision matrix: RDB for snapshots with acceptable data loss, AOF for near-durability, both for maximum recovery, none for pure caches. Recovery time estimates, storage overhead, and when managed Redis changes the calculus.

P-3 — Persistence Decision Framework: RDB vs AOF vs Both vs None

Who this module is for: You have read about RDB and AOF and understand how each works. Now you need to make the right call for your specific use case — and understand why the wrong configuration is worse than no persistence at all. This module gives you the decision framework, the configuration trade-offs, and a production checklist.


The Four Configurations

Redis persistence is not a binary choice — it is a spectrum with four operating modes:

ModeData LossRecovery SpeedDisk I/OUse Case
No persistenceAll data lost on restartInstant (empty start)NonePure ephemeral cache
RDB onlyUp to snapshot interval (minutes)Fast (binary load)Periodic burstsTolerable data loss, backup snapshots
AOF onlyUp to fsync interval (≤ 1 sec)Slow (replay commands)ContinuousNear-durability, audit log
RDB + AOFUp to fsync interval (≤ 1 sec)Fast (RDB base + AOF delta)BothMaximum safety, production default

Mode 1: No Persistence

# redis.conf
save ""
appendonly no

Redis runs entirely in RAM. On crash or restart: all data is gone.

When this is correct:

  • Redis is a read-through cache where the primary database is the source of truth
  • Cache misses are acceptable — the application re-populates from the database
  • You explicitly want no disk I/O from Redis

When this is a mistake:

  • Rate limiters (a crash resets all counters)
  • Session tokens (users get logged out)
  • Job queues (pending jobs are lost)
  • Any data that does not exist elsewhere

The mistake most engineers make: deploying a "cache" that actually stores authoritative state (sessions, tokens, locks) without persistence. When Redis restarts — deployments, OOM kills, hardware failures — data loss is silent and consequences appear minutes later.


Mode 2: RDB Only

# redis.conf
save 900 1
save 300 100
save 60 10000
appendonly no

When RDB only is correct:

  • Data can tolerate loss up to the snapshot interval (minutes)
  • You need fast restarts (loading 10GB of RDB takes seconds; replaying 10GB of AOF takes minutes)
  • The dataset changes slowly relative to the snapshot interval
  • You need periodic backups for disaster recovery regardless of AOF

The data loss window: If Redis crashes 59 seconds after the last snapshot and the save 60 10000 threshold has been met, you lose up to 59 seconds of writes. If the threshold has not been met (less than 10,000 writes), you wait for the next lower-frequency trigger.

RDB for backups: Even if you use AOF for primary persistence, keep RDB snapshots for long-term backup:

# Keep the last 7 daily snapshots
redis-cli BGSAVE
cp /var/lib/redis/dump.rdb /backups/redis-$(date +%Y%m%d).rdb

Mode 3: AOF Only

# redis.conf
save ""
appendonly yes
appendfsync everysec

When AOF only is correct:

  • You need ≤ 1 second data loss guarantee
  • You do not need fast restarts (a large AOF can take minutes to replay)
  • You want an audit log of all commands for compliance or debugging

Recovery time: Replaying an AOF to reconstruct 1GB of data takes seconds. Replaying an AOF for a 100GB dataset can take 10–30 minutes. During this time, Redis is not serving requests.

For large datasets, AOF-only is impractical for production deployments that need fast failover. The hybrid RDB+AOF mode (Mode 4) solves this.


# redis.conf
save 3600 1
save 300 100
appendonly yes
appendfsync everysec
aof-use-rdb-preamble yes    → hybrid format (default in Redis 7)

With aof-use-rdb-preamble yes, AOF rewrite creates a file that starts with an RDB snapshot followed by incremental AOF commands:

[RDB snapshot at time T] [AOF commands from T to now]

On restart:

  1. Redis loads the RDB preamble (fast binary load)
  2. Redis replays only the AOF delta since the snapshot (small, fast)
  3. Ready to serve in seconds even for large datasets

Why both RDB and AOF:

  • If the AOF file is corrupted, you can restore from the last RDB snapshot (losing only commands since that snapshot)
  • The RDB snapshot provides fast startup
  • The AOF provides near-durability between snapshots

The Managed Redis Decision

If you use a managed Redis service (AWS ElastiCache, Google Memorystore, Redis Cloud, Upstash), the persistence configuration is often abstracted. Key questions to ask:

  • ElastiCache Redis: AOF is not supported. RDB snapshots (daily) are supported on certain node types. For persistence, use ElastiCache Redis with Multi-AZ — failover happens in seconds without data loss on the replica.
  • Redis Cloud / Upstash: Full AOF + RDB support, configurable per database.
  • Google Memorystore: Persistence is configurable; RDB snapshots available.

For managed services: Configure persistence in the control plane UI rather than redis.conf. Understand whether your cluster mode (if any) handles persistence and replication together.


Production Configuration Checklist

Before enabling persistence:

  • Provision sufficient disk space: 2-3× the dataset size for RDB + AOF rewrite headroom
  • Ensure disk write speed matches your write throughput (SSD strongly recommended for AOF always)
  • Set dir to a mount with sufficient I/O bandwidth separate from your OS disk
  • Configure stop-writes-on-bgsave-error yes (default) — this is your safety net for disk failures
  • Plan for BGSAVE memory overhead: provision 1.5× dataset size in RAM

For AOF:

  • Set appendfsync everysec unless you need zero data loss (in which case accept the throughput cost of always)
  • Enable aof-use-rdb-preamble yes (default in Redis 7) for fast restarts
  • Configure auto-aof-rewrite-percentage 100 and auto-aof-rewrite-min-size 64mb
  • Monitor aof_current_size — an AOF that never rewrites will grow indefinitely
  • Test recovery: regularly run redis-check-aof appendonly.aof to verify file integrity

For RDB:

  • Configure save directives appropriate to your data loss tolerance
  • Keep backup copies off-server (S3, GCS, etc.) — losing the server loses the local RDB
  • Test restore: periodically start a test Redis instance from a backup dump.rdb
  • Monitor rdb_last_bgsave_statuserr means your data is not being persisted

The Wrong Configuration (and Its Consequences)

"I'll enable persistence later"

Engineers defer persistence configuration until after launch. Between launch and the first Redis restart, the dataset grows. When Redis restarts (OOM kill, deployment, EC2 instance replacement), all data is lost. Users are logged out, rate limits are reset, pending jobs vanish.

Fix: Configure persistence before deploying Redis for any use case that holds authoritative state.

appendfsync always on a write-heavy workload

With appendfsync always and 10,000 writes/second, Redis will call fsync() 10,000 times per second. On an SSD, each fsync() takes ~0.1–0.5ms. Total: 1–5 seconds of fsync time per second of operation. Redis cannot keep up — write latency spikes and the command queue grows.

Fix: Use appendfsync everysec unless you have measured the write throughput and confirmed the disk can handle always at your write rate.

No stop-writes-on-bgsave-error yes

If stop-writes-on-bgsave-error no, Redis continues accepting writes even after BGSAVE fails (disk full, permissions error). Your in-memory state diverges from the last successful snapshot. When Redis eventually restarts, you load an outdated snapshot with a large silent data loss.

Fix: Keep stop-writes-on-bgsave-error yes (default). Build alerting on rdb_last_bgsave_status.


Decision Tree

Is Redis a pure cache (primary data is elsewhere)?
    → Yes: No persistence. Set save "" and appendonly no.

Does data loss up to the snapshot interval matter?
    → No: RDB only. Configure save directives.

Does data loss matter but you can tolerate ≤ 1 second?
    → Yes: AOF with appendfsync everysec.
    → Is fast restart critical (large dataset, SLA on restart time)?
        → Yes: RDB + AOF with aof-use-rdb-preamble yes.

Is zero data loss required?
    → Yes: AOF with appendfsync always + hardware-backed write cache (RAID controller with battery).
    → Consider: Redis is not the right primary store for zero-loss requirements. Use PostgreSQL.

Summary

  • No persistence: Correct for pure caches. Dangerous when Redis holds authoritative state.
  • RDB only: Correct when data loss up to minutes is acceptable and fast restarts matter.
  • AOF only: Correct when ≤ 1 second data loss is required. Slow restart for large datasets.
  • RDB + AOF hybrid: Production default. Fast startup (RDB preamble) + near-durability (AOF delta).
  • appendfsync always limits throughput to ~100–200 writes/second — measure before using.
  • stop-writes-on-bgsave-error yes is a safety net — build alerting on rdb_last_bgsave_status.
  • Managed Redis services may have limited persistence options — know your managed service's capabilities.

Next: P-4 — Memory Profiling and Optimization — diagnosing unexpected memory growth with INFO memory, MEMORY USAGE, and MEMORY DOCTOR, and the practical techniques that consistently recover the most RAM.

© 2026 Jatin Jain Saraf (JJS). All rights reserved.