Multi-stage builds, Alpine vs slim images, node_modules inside containers, non-root users, health checks, docker-compose, SIGTERM graceful shutdown (draining in-flight requests, closing DB pools, flushing log buffers), and the production readiness checklist.
Module P-14 — Dockerizing Node.js Applications for Production
What this module covers: A Docker image is a portable, reproducible build of your application. If it runs in your container, it runs in production — no more "works on my machine." This module goes deeper than the basics: Alpine vs slim base images and when each is right, why you never install devDependencies in a production image, .dockerignore as a security and performance tool, running as a non-root user, health checks that integrate with orchestrators, handling secrets without baking them into images, and a production readiness checklist that covers the most common containerisation mistakes. This module extends P-8's introduction and covers the remaining depth for production deployments.
The Node.js Image Landscape
The official Node.js Docker images come in three variants. Choosing the wrong one adds hundreds of megabytes and attack surface.
| Variant | Base | Size (node:22) | When to use |
|---|---|---|---|
node:22 | Debian Bookworm | ~1.1 GB | Never in production |
node:22-slim | Debian Bookworm slim | ~220 MB | Most production apps |
node:22-alpine | Alpine Linux | ~60 MB | When size matters most |
Alpine trade-offs: Alpine uses musl libc instead of glibc. Most npm packages are fine. Some packages with native bindings (bcrypt, sharp, canvas) need compilation flags. Alpine also uses ash not bash — shell scripts may need adjustments.
Slim is the default production choice. It's Debian-based (glibc, familiar tooling), small enough, and has far fewer compatibility issues than Alpine.
dockerfile# Use the exact version — never 'latest' in production FROM node:22.3.0-slim AS base # Alpine — when image size is critical and you've verified native modules work FROM node:22.3.0-alpine AS base
Pin the exact version, not the major tag. node:22-slim will change when Node releases a patch. node:22.3.0-slim will not.
Multi-Stage Build: The Complete Pattern
Building on P-8, here is the complete production Dockerfile with every best practice applied:
dockerfile# Dockerfile # ─── Stage 1: Dependencies ──────────────────────────────────────────────────── # Install ALL dependencies (including devDependencies needed for the build) FROM node:22.3.0-slim AS deps WORKDIR /app # Copy manifests first — Docker cache layer # If package.json doesn't change, this layer is cached across builds COPY package.json package-lock.json ./ # npm ci: clean install from lockfile, no network variance RUN npm ci # ─── Stage 2: Builder ───────────────────────────────────────────────────────── # Compile TypeScript → JavaScript FROM node:22.3.0-slim AS builder WORKDIR /app # Copy all dependencies from the deps stage COPY /app/node_modules ./node_modules # Copy source files COPY tsconfig.json ./ COPY src ./src # Compile — output to /app/dist RUN npm run build # Prune to production-only dependencies AFTER compilation # (devDependencies were needed to compile, not to run) RUN npm ci --omit=dev # ─── Stage 3: Production runtime ───────────────────────────────────────────── FROM node:22.3.0-slim AS production WORKDIR /app # Security: create a non-root user # Node apps should never run as root — a container escape as root = host root RUN groupadd --gid 1001 nodejs \ && useradd --uid 1001 --gid nodejs --shell /bin/bash --create-home appuser # Copy production dependencies from builder COPY /app/node_modules ./node_modules # Copy compiled application from builder COPY /app/dist ./dist # Copy any non-compiled assets (email templates, static files, etc.) # COPY --from=builder --chown=appuser:nodejs /app/public ./public # Set ownership and switch to non-root user USER appuser # Metadata — does not affect runtime EXPOSE 3000 LABEL org.opencontainers.image.title="My API" LABEL org.opencontainers.image.version="1.0.0" # Health check — Docker will mark the container unhealthy if this fails HEALTHCHECK \ CMD node -e "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))" # exec form — Node is PID 1, receives SIGTERM directly # shell form would make sh PID 1, SIGTERM would not reach Node CMD ["node", "dist/index.js"]
The three-stage pattern:
- deps — installs all
node_modulesincluding devDependencies. Cached whenpackage.jsondoesn't change. - builder — compiles TypeScript, then prunes to production deps.
- production — only the compiled JS and production
node_modules. No TypeScript source, no devDependencies, no build tools.
The Health Check Endpoint
Your container orchestrator (Docker Swarm, Kubernetes, ECS) needs to know if your container is healthy before routing traffic to it. Without a health check, a container is assumed healthy the moment it starts — even if the app crashed during startup.
typescript// src/routes/health.routes.ts import { Router } from 'express'; import prisma from '../db/prisma.js'; import redis from '../db/redis.js'; const router = Router(); // Liveness — is the process alive? router.get('/health', (req, res) => { res.json({ status: 'ok', timestamp: new Date().toISOString() }); }); // Readiness — is the app ready to serve traffic? // Returns 503 if any dependency is unhealthy router.get('/health/ready', async (req, res) => { const checks: Record<string, 'ok' | 'error'> = {}; try { await prisma.$queryRaw`SELECT 1`; checks.database = 'ok'; } catch { checks.database = 'error'; } try { await redis.ping(); checks.redis = 'ok'; } catch { checks.redis = 'error'; } const healthy = Object.values(checks).every(v => v === 'ok'); res.status(healthy ? 200 : 503).json({ status: healthy ? 'ok' : 'degraded', checks, uptime: process.uptime(), memory: process.memoryUsage(), }); }); export default router;
typescript// src/app.ts — register before auth middleware so health checks don't require a token app.use('/health', healthRouter); app.use(authenticate); // auth starts here app.use('/users', usersRouter);
The Dockerfile HEALTHCHECK calls /health (liveness only) — it should be fast and never fail unless the Node process itself is broken. The /health/ready endpoint with dependency checks is for orchestrator readiness probes, not the Docker health check.
.dockerignore: Security and Speed
.dockerignore prevents files from entering the build context. Every file sent to the Docker daemon gets hashed for layer caching. Large node_modules (200MB+) without .dockerignore adds seconds to every build:
# .dockerignore
# Dependencies — reinstalled inside the image
node_modules
# Build output — rebuilt inside the image
dist
build
.next
# Environment files — NEVER in the image
.env
.env.*
*.env
# Git
.git
.gitignore
# Logs
*.log
npm-debug.log*
# Test files — not needed in production image
**/__tests__
**/*.test.ts
**/*.spec.ts
coverage
jest.config.*
# Documentation
README.md
docs
# IDE
.vscode
.idea
*.swp
# OS
.DS_Store
Thumbs.db
# Docker files themselves
Dockerfile*
docker-compose*
Secrets: Never Bake Them Into Images
dockerfile# WRONG — secret ends up in the image layer permanently ENV DATABASE_URL="postgresql://admin:password@prod-db:5432/myapp" # ALSO WRONG — even with --build-arg, the value is in the image metadata ARG DATABASE_URL ENV DATABASE_URL=${DATABASE_URL}
Secrets in image layers are visible to anyone who can pull the image. They persist even after a docker run that overwrites them. The correct approaches:
Option 1: Runtime environment variables (simplest):
bashdocker run \ -e DATABASE_URL="$DATABASE_URL" \ -e JWT_ACCESS_SECRET="$JWT_ACCESS_SECRET" \ myapp:latest
Option 2: .env file mounted at runtime:
bashdocker run --env-file /etc/myapp/.env myapp:latest
Option 3: Docker secrets (for Docker Swarm):
yaml# docker-compose.yml (Swarm mode) services: app: secrets: - db_password environment: DB_PASSWORD_FILE: /run/secrets/db_password secrets: db_password: external: true # created with: docker secret create db_password <(echo "password")
Read the secret file in your app:
typescriptimport fs from 'fs'; function readSecret(name: string): string { const filePath = process.env[`${name}_FILE`]; if (filePath) return fs.readFileSync(filePath, 'utf-8').trim(); return process.env[name] ?? ''; } const dbPassword = readSecret('DB_PASSWORD');
Option 4: Secrets manager (production best practice):
AWS Secrets Manager, HashiCorp Vault, or Google Secret Manager. Fetch secrets at startup, cache in memory. Never in environment variables, never in files.
Layer Caching: Fast Builds
Docker caches each layer. A layer is invalidated when its instruction or any of its inputs change. The rule: least frequently changed content first.
dockerfile# Slow — source code change invalidates npm ci COPY . . RUN npm ci # Fast — package.json change is rare; npm ci is cached most builds COPY package.json package-lock.json ./ RUN npm ci COPY . .
For monorepos or apps with many packages, use a cache mount:
dockerfileRUN \ npm ci
The --mount=type=cache persists the npm cache across builds without it being part of the layer — best of both worlds.
docker-compose for Local Development
The local development compose file from P-8, extended with named volumes and proper dependency ordering:
yaml# docker-compose.dev.yml version: '3.9' services: api: build: context: . target: deps # stops after installing deps — uses ts-node for source volumes: - ./src:/app/src:ro # mount source read-only — changes trigger ts-node reload - ./tsconfig.json:/app/tsconfig.json:ro environment: NODE_ENV: development DATABASE_URL: postgresql://postgres:postgres@postgres:5432/myapp_dev REDIS_URL: redis://redis:6379 PORT: 3000 ports: - '3000:3000' command: npm run dev depends_on: postgres: { condition: service_healthy } redis: { condition: service_healthy } restart: unless-stopped postgres: image: postgres:16-alpine volumes: - postgres_data:/var/lib/postgresql/data - ./scripts/init.sql:/docker-entrypoint-initdb.d/init.sql:ro environment: POSTGRES_DB: myapp_dev POSTGRES_USER: postgres POSTGRES_PASSWORD: postgres ports: - '5432:5432' healthcheck: test: ['CMD-SHELL', 'pg_isready -U postgres -d myapp_dev'] interval: 5s timeout: 5s retries: 5 start_period: 10s redis: image: redis:7-alpine command: redis-server --save 60 1 --loglevel warning volumes: - redis_data:/data ports: - '6379:6379' healthcheck: test: ['CMD', 'redis-cli', 'ping'] interval: 5s retries: 5 volumes: postgres_data: redis_data:
bash# Development docker-compose -f docker-compose.dev.yml up # Production — uses the main Dockerfile's production stage docker-compose up
Production Readiness Checklist
Before shipping a containerised Node.js app to production:
Image
- Multi-stage build — no TypeScript source or devDependencies in production image
- Pinned base image version (
node:22.3.0-slim, notnode:22-slim) - Non-root user (
USER appuser) -
.dockerignoreexcludes.env,node_modules,dist, tests - Image size is reasonable (< 300MB for most apps)
Runtime
- No secrets in the image (
ENVinstructions contain no credentials) - Secrets injected via runtime env vars, env files, or secrets manager
-
CMDuses exec form (["node", "dist/index.js"]), not shell form - Graceful shutdown handles
SIGTERM— drains connections, closes DB
Health
-
/healthendpoint returns 200 when the process is alive -
HEALTHCHECKin Dockerfile calls the health endpoint -
--start-periodaccounts for Prisma migration / connection setup time
Process
-
uncaughtExceptionandunhandledRejectionhandlers log the error and exit with code 1 - Exits cleanly on fatal errors — let the orchestrator restart it
- Memory limit set in container config (
--memory 512m) — prevents one container from OOMing the host
Observability
- Structured JSON logs (Pino) — not
console.log - Logs go to stdout/stderr — Docker captures them automatically
- No log files written inside the container (ephemeral filesystem)
- Correlation IDs on all log entries
Building and Tagging for CI/CD
bash# Build with a tag matching the git commit SHA docker build \ --target production \ -t myapp:$(git rev-parse --short HEAD) \ -t myapp:latest \ . # Push to a registry docker tag myapp:latest ghcr.io/myorg/myapp:latest docker push ghcr.io/myorg/myapp:latest # Inspect image layers — find what's making it large docker history myapp:latest docker dive myapp:latest # https://github.com/wagoodman/dive
Summary
node:22-slimis the default production base — Debian-based, glibc-compatible, 220MB. Use Alpine only when size is critical and you've verified native modules compile.- Pin exact versions (
node:22.3.0-slim) — tags likenode:22are mutable and will change under you. - Three-stage build: deps (install all) → builder (compile + prune) → production (compiled JS + prod deps only). Source and devDependencies never reach production.
- Non-root user is mandatory. Container escape as root = host root. One
useraddandUSERinstruction fixes this. HEALTHCHECKtells Docker (and orchestrators) when your app is actually ready. Without it, traffic routes to containers that are still initialising or have crashed post-startup.- Never
ENVsecrets. Inject at runtime via--env-file, Docker secrets, or a secrets manager. - Layer order matters: copy
package.jsonfirst,npm ci, then source code. Keeps the dependency layer cached across most builds. - Logs to stdout — Docker handles rotation, aggregation, and shipping to your log platform. No log files inside containers.
This completes Phase 2: The Practitioner. All 14 modules cover the complete production Node.js application stack — from architecture and auth through TypeScript, testing, security, caching, real-time, REST, GraphQL, background jobs, serialisation, and containerisation.
Next: Phase 3: The Architect begins with A-0 — the mental model reset that bridges everything you've built as a Practitioner to the performance engineering and distributed systems thinking required at the senior/principal level.
Graceful Shutdown — The Deploy Bug Nobody Talks About
Every Kubernetes rolling deploy, every Docker container restart, every PM2 reload sends SIGTERM to your Node.js process before killing it. If your process doesn't handle SIGTERM, here's what happens: Kubernetes waits for terminationGracePeriodSeconds (default 30s), then sends SIGKILL. SIGKILL is immediate and unhandled — active HTTP connections are severed mid-response, database transactions are abandoned, BullMQ jobs are marked as failed even though they were halfway done, Pino's buffer might not flush to stdout before the process dies.
This happens on every single deploy. The bugs are intermittent and hard to reproduce.
The Shutdown Sequence
javascript// src/shutdown.js import { server } from './server.js' import { pool } from './db.js' import { worker } from './queue.js' import logger from './logger.js' let isShuttingDown = false async function shutdown(signal) { if (isShuttingDown) return isShuttingDown = true logger.info({ signal }, 'Shutdown initiated') // Step 1: Stop accepting new connections // server.close() stops accepting new connections but waits for existing ones to finish await new Promise((resolve, reject) => { server.close((err) => { if (err) reject(err) else resolve() }) // Force close after 10s if connections don't drain setTimeout(() => { logger.warn('Forcing connection close after timeout') resolve() }, 10_000) }) logger.info('HTTP server closed') // Step 2: Stop BullMQ workers (finish current job, don't start new ones) await worker.close() logger.info('BullMQ worker stopped') // Step 3: Close database pool (waits for active queries to complete) await pool.end() logger.info('Database pool closed') // Step 4: Flush logs (Pino batches writes — flush before exit) await new Promise((resolve) => logger.flush(resolve)) process.exit(0) } process.on('SIGTERM', () => shutdown('SIGTERM')) process.on('SIGINT', () => shutdown('SIGINT')) // Ctrl+C in development // Catch unhandled promise rejections — don't silently swallow them process.on('unhandledRejection', (reason, promise) => { logger.error({ reason, promise }, 'Unhandled promise rejection') shutdown('unhandledRejection') })
Kubernetes terminationGracePeriodSeconds Alignment
Kubernetes's default terminationGracePeriodSeconds is 30 seconds. Your application's total shutdown time must fit within this window. If your shutdown sequence takes 35 seconds (slow database drain), Kubernetes sends SIGKILL after 30 seconds, cutting your shutdown short.
yaml# k8s deployment.yaml spec: template: spec: terminationGracePeriodSeconds: 60 # give the app 60s to shut down gracefully containers: - name: api lifecycle: preStop: exec: # Optional: sleep 5s before SIGTERM so load balancer removes pod from rotation first command: ["/bin/sleep", "5"]
The preStop hook runs before SIGTERM. The sleep gives the load balancer (which polls health checks every few seconds) time to mark the pod as "not ready" and stop routing traffic to it before the shutdown begins. Without this, in-flight requests can still arrive during shutdown.
Draining Active HTTP Connections
server.close() stops accepting new connections but waits for existing keep-alive connections to close on their own. Keep-alive connections can stay open indefinitely. Add explicit connection draining:
javascriptimport http from 'node:http' const connections = new Set() server.on('connection', (socket) => { connections.add(socket) socket.on('close', () => connections.delete(socket)) }) async function closeServer() { // Stop accepting new connections server.close() // Destroy all open keep-alive connections for (const socket of connections) { socket.destroy() } connections.clear() }
For graceful HTTP/2 draining (Fastify, h2), use the framework's built-in close method which handles HTTP/2 GOAWAY frames properly.
Testing Your Shutdown Handler
bash# Simulate Kubernetes SIGTERM kill -TERM $(pgrep -f "node src/index.js") # Watch logs for the shutdown sequence # Should see: "Shutdown initiated" → "HTTP server closed" → "Worker stopped" → "DB closed" # Should NOT see: "SIGKILL" or abrupt process exit
Load test during shutdown to verify no requests are dropped:
bash# Terminal 1: start load test hey -z 30s -c 10 http://localhost:3000/api/health # Terminal 2: send SIGTERM during the load test sleep 5 && kill -TERM $(pgrep -f "node src/index.js") # Expected: 0 failed requests in hey output (graceful drain) # Bad: >0 failed requests (connections severed mid-flight)