Module P-14·25 min read

Multi-stage builds, Alpine vs slim images, node_modules inside containers, non-root users, health checks, docker-compose, SIGTERM graceful shutdown (draining in-flight requests, closing DB pools, flushing log buffers), and the production readiness checklist.

Module P-14 — Dockerizing Node.js Applications for Production

What this module covers: A Docker image is a portable, reproducible build of your application. If it runs in your container, it runs in production — no more "works on my machine." This module goes deeper than the basics: Alpine vs slim base images and when each is right, why you never install devDependencies in a production image, .dockerignore as a security and performance tool, running as a non-root user, health checks that integrate with orchestrators, handling secrets without baking them into images, and a production readiness checklist that covers the most common containerisation mistakes. This module extends P-8's introduction and covers the remaining depth for production deployments.

The Node.js Image Landscape

The official Node.js Docker images come in three variants. Choosing the wrong one adds hundreds of megabytes and attack surface.

Variant	Base	Size (node:22)	When to use
`node:22`	Debian Bookworm	~1.1 GB	Never in production
`node:22-slim`	Debian Bookworm slim	~220 MB	Most production apps
`node:22-alpine`	Alpine Linux	~60 MB	When size matters most

Alpine trade-offs: Alpine uses musl libc instead of glibc. Most npm packages are fine. Some packages with native bindings (bcrypt, sharp, canvas) need compilation flags. Alpine also uses ash not bash — shell scripts may need adjustments.

Slim is the default production choice. It's Debian-based (glibc, familiar tooling), small enough, and has far fewer compatibility issues than Alpine.

dockerfile
# Use the exact version — never 'latest' in production
FROM node:22.3.0-slim AS base

# Alpine — when image size is critical and you've verified native modules work
FROM node:22.3.0-alpine AS base

Pin the exact version, not the major tag. node:22-slim will change when Node releases a patch. node:22.3.0-slim will not.

Multi-Stage Build: The Complete Pattern

Building on P-8, here is the complete production Dockerfile with every best practice applied:

dockerfile
# Dockerfile

# ─── Stage 1: Dependencies ────────────────────────────────────────────────────
# Install ALL dependencies (including devDependencies needed for the build)
FROM node:22.3.0-slim AS deps

WORKDIR /app

# Copy manifests first — Docker cache layer
# If package.json doesn't change, this layer is cached across builds
COPY package.json package-lock.json ./

# npm ci: clean install from lockfile, no network variance
RUN npm ci


# ─── Stage 2: Builder ─────────────────────────────────────────────────────────
# Compile TypeScript → JavaScript
FROM node:22.3.0-slim AS builder

WORKDIR /app

# Copy all dependencies from the deps stage
COPY --from=deps /app/node_modules ./node_modules

# Copy source files
COPY tsconfig.json ./
COPY src ./src

# Compile — output to /app/dist
RUN npm run build

# Prune to production-only dependencies AFTER compilation
# (devDependencies were needed to compile, not to run)
RUN npm ci --omit=dev


# ─── Stage 3: Production runtime ─────────────────────────────────────────────
FROM node:22.3.0-slim AS production

WORKDIR /app

# Security: create a non-root user
# Node apps should never run as root — a container escape as root = host root
RUN groupadd --gid 1001 nodejs \
    && useradd --uid 1001 --gid nodejs --shell /bin/bash --create-home appuser

# Copy production dependencies from builder
COPY --from=builder --chown=appuser:nodejs /app/node_modules ./node_modules

# Copy compiled application from builder
COPY --from=builder --chown=appuser:nodejs /app/dist ./dist

# Copy any non-compiled assets (email templates, static files, etc.)
# COPY --from=builder --chown=appuser:nodejs /app/public ./public

# Set ownership and switch to non-root user
USER appuser

# Metadata — does not affect runtime
EXPOSE 3000
LABEL org.opencontainers.image.title="My API"
LABEL org.opencontainers.image.version="1.0.0"

# Health check — Docker will mark the container unhealthy if this fails
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"

# exec form — Node is PID 1, receives SIGTERM directly
# shell form would make sh PID 1, SIGTERM would not reach Node
CMD ["node", "dist/index.js"]

The three-stage pattern:

deps — installs all node_modules including devDependencies. Cached when package.json doesn't change.
builder — compiles TypeScript, then prunes to production deps.
production — only the compiled JS and production node_modules. No TypeScript source, no devDependencies, no build tools.

The Health Check Endpoint

Your container orchestrator (Docker Swarm, Kubernetes, ECS) needs to know if your container is healthy before routing traffic to it. Without a health check, a container is assumed healthy the moment it starts — even if the app crashed during startup.

typescript
// src/routes/health.routes.ts
import { Router } from 'express';
import prisma from '../db/prisma.js';
import redis from '../db/redis.js';

const router = Router();

// Liveness — is the process alive?
router.get('/health', (req, res) => {
  res.json({ status: 'ok', timestamp: new Date().toISOString() });
});

// Readiness — is the app ready to serve traffic?
// Returns 503 if any dependency is unhealthy
router.get('/health/ready', async (req, res) => {
  const checks: Record<string, 'ok' | 'error'> = {};

  try {
    await prisma.$queryRaw`SELECT 1`;
    checks.database = 'ok';
  } catch {
    checks.database = 'error';
  }

  try {
    await redis.ping();
    checks.redis = 'ok';
  } catch {
    checks.redis = 'error';
  }

  const healthy = Object.values(checks).every(v => v === 'ok');

  res.status(healthy ? 200 : 503).json({
    status: healthy ? 'ok' : 'degraded',
    checks,
    uptime: process.uptime(),
    memory: process.memoryUsage(),
  });
});

export default router;

typescript
// src/app.ts — register before auth middleware so health checks don't require a token
app.use('/health', healthRouter);
app.use(authenticate);  // auth starts here
app.use('/users', usersRouter);

The Dockerfile HEALTHCHECK calls /health (liveness only) — it should be fast and never fail unless the Node process itself is broken. The /health/ready endpoint with dependency checks is for orchestrator readiness probes, not the Docker health check.

.dockerignore: Security and Speed

.dockerignore prevents files from entering the build context. Every file sent to the Docker daemon gets hashed for layer caching. Large node_modules (200MB+) without .dockerignore adds seconds to every build:

# .dockerignore

# Dependencies — reinstalled inside the image
node_modules

# Build output — rebuilt inside the image
dist
build
.next

# Environment files — NEVER in the image
.env
.env.*
*.env

# Git
.git
.gitignore

# Logs
*.log
npm-debug.log*

# Test files — not needed in production image
**/__tests__
**/*.test.ts
**/*.spec.ts
coverage
jest.config.*

# Documentation
README.md
docs

# IDE
.vscode
.idea
*.swp

# OS
.DS_Store
Thumbs.db

# Docker files themselves
Dockerfile*
docker-compose*

Secrets: Never Bake Them Into Images

dockerfile
# WRONG — secret ends up in the image layer permanently
ENV DATABASE_URL="postgresql://admin:password@prod-db:5432/myapp"

# ALSO WRONG — even with --build-arg, the value is in the image metadata
ARG DATABASE_URL
ENV DATABASE_URL=${DATABASE_URL}

Secrets in image layers are visible to anyone who can pull the image. They persist even after a docker run that overwrites them. The correct approaches:

Option 1: Runtime environment variables (simplest):

bash
docker run \
  -e DATABASE_URL="$DATABASE_URL" \
  -e JWT_ACCESS_SECRET="$JWT_ACCESS_SECRET" \
  myapp:latest

Option 2: .env file mounted at runtime:

bash
docker run --env-file /etc/myapp/.env myapp:latest

Option 3: Docker secrets (for Docker Swarm):

yaml
# docker-compose.yml (Swarm mode)
services:
  app:
    secrets:
      - db_password
    environment:
      DB_PASSWORD_FILE: /run/secrets/db_password

secrets:
  db_password:
    external: true  # created with: docker secret create db_password <(echo "password")

Read the secret file in your app:

typescript
import fs from 'fs';

function readSecret(name: string): string {
  const filePath = process.env[`${name}_FILE`];
  if (filePath) return fs.readFileSync(filePath, 'utf-8').trim();
  return process.env[name] ?? '';
}

const dbPassword = readSecret('DB_PASSWORD');

Option 4: Secrets manager (production best practice):

AWS Secrets Manager, HashiCorp Vault, or Google Secret Manager. Fetch secrets at startup, cache in memory. Never in environment variables, never in files.

Layer Caching: Fast Builds

Docker caches each layer. A layer is invalidated when its instruction or any of its inputs change. The rule: least frequently changed content first.

dockerfile
# Slow — source code change invalidates npm ci
COPY . .
RUN npm ci

# Fast — package.json change is rare; npm ci is cached most builds
COPY package.json package-lock.json ./
RUN npm ci
COPY . .

For monorepos or apps with many packages, use a cache mount:

dockerfile
RUN --mount=type=cache,target=/root/.npm \
    npm ci

The --mount=type=cache persists the npm cache across builds without it being part of the layer — best of both worlds.

docker-compose for Local Development

The local development compose file from P-8, extended with named volumes and proper dependency ordering:

yaml
# docker-compose.dev.yml
version: '3.9'

services:
  api:
    build:
      context: .
      target: deps           # stops after installing deps — uses ts-node for source
    volumes:
      - ./src:/app/src:ro    # mount source read-only — changes trigger ts-node reload
      - ./tsconfig.json:/app/tsconfig.json:ro
    environment:
      NODE_ENV: development
      DATABASE_URL: postgresql://postgres:postgres@postgres:5432/myapp_dev
      REDIS_URL: redis://redis:6379
      PORT: 3000
    ports:
      - '3000:3000'
    command: npm run dev
    depends_on:
      postgres: { condition: service_healthy }
      redis: { condition: service_healthy }
    restart: unless-stopped

  postgres:
    image: postgres:16-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./scripts/init.sql:/docker-entrypoint-initdb.d/init.sql:ro
    environment:
      POSTGRES_DB: myapp_dev
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    ports:
      - '5432:5432'
    healthcheck:
      test: ['CMD-SHELL', 'pg_isready -U postgres -d myapp_dev']
      interval: 5s
      timeout: 5s
      retries: 5
      start_period: 10s

  redis:
    image: redis:7-alpine
    command: redis-server --save 60 1 --loglevel warning
    volumes:
      - redis_data:/data
    ports:
      - '6379:6379'
    healthcheck:
      test: ['CMD', 'redis-cli', 'ping']
      interval: 5s
      retries: 5

volumes:
  postgres_data:
  redis_data:

bash
# Development
docker-compose -f docker-compose.dev.yml up

# Production — uses the main Dockerfile's production stage
docker-compose up

Production Readiness Checklist

Before shipping a containerised Node.js app to production:

Image

Multi-stage build — no TypeScript source or devDependencies in production image
Pinned base image version (node:22.3.0-slim, not node:22-slim)
Non-root user (USER appuser)
.dockerignore excludes .env, node_modules, dist, tests
Image size is reasonable (< 300MB for most apps)

Runtime

No secrets in the image (ENV instructions contain no credentials)
Secrets injected via runtime env vars, env files, or secrets manager
CMD uses exec form (["node", "dist/index.js"]), not shell form
Graceful shutdown handles SIGTERM — drains connections, closes DB

Health

/health endpoint returns 200 when the process is alive
HEALTHCHECK in Dockerfile calls the health endpoint
--start-period accounts for Prisma migration / connection setup time

Process

uncaughtException and unhandledRejection handlers log the error and exit with code 1
Exits cleanly on fatal errors — let the orchestrator restart it
Memory limit set in container config (--memory 512m) — prevents one container from OOMing the host

Observability

Structured JSON logs (Pino) — not console.log
Logs go to stdout/stderr — Docker captures them automatically
No log files written inside the container (ephemeral filesystem)
Correlation IDs on all log entries

Building and Tagging for CI/CD

bash
# Build with a tag matching the git commit SHA
docker build \
  --target production \
  -t myapp:$(git rev-parse --short HEAD) \
  -t myapp:latest \
  .

# Push to a registry
docker tag myapp:latest ghcr.io/myorg/myapp:latest
docker push ghcr.io/myorg/myapp:latest

# Inspect image layers — find what's making it large
docker history myapp:latest
docker dive myapp:latest  # https://github.com/wagoodman/dive

Summary

node:22-slim is the default production base — Debian-based, glibc-compatible, 220MB. Use Alpine only when size is critical and you've verified native modules compile.
Pin exact versions (node:22.3.0-slim) — tags like node:22 are mutable and will change under you.
Three-stage build: deps (install all) → builder (compile + prune) → production (compiled JS + prod deps only). Source and devDependencies never reach production.
Non-root user is mandatory. Container escape as root = host root. One useradd and USER instruction fixes this.
HEALTHCHECK tells Docker (and orchestrators) when your app is actually ready. Without it, traffic routes to containers that are still initialising or have crashed post-startup.
Never ENV secrets. Inject at runtime via --env-file, Docker secrets, or a secrets manager.
Layer order matters: copy package.json first, npm ci, then source code. Keeps the dependency layer cached across most builds.
Logs to stdout — Docker handles rotation, aggregation, and shipping to your log platform. No log files inside containers.

This completes Phase 2: The Practitioner. All 14 modules cover the complete production Node.js application stack — from architecture and auth through TypeScript, testing, security, caching, real-time, REST, GraphQL, background jobs, serialisation, and containerisation.

Next: Phase 3: The Architect begins with A-0 — the mental model reset that bridges everything you've built as a Practitioner to the performance engineering and distributed systems thinking required at the senior/principal level.

Graceful Shutdown — The Deploy Bug Nobody Talks About

Every Kubernetes rolling deploy, every Docker container restart, every PM2 reload sends SIGTERM to your Node.js process before killing it. If your process doesn't handle SIGTERM, here's what happens: Kubernetes waits for terminationGracePeriodSeconds (default 30s), then sends SIGKILL. SIGKILL is immediate and unhandled — active HTTP connections are severed mid-response, database transactions are abandoned, BullMQ jobs are marked as failed even though they were halfway done, Pino's buffer might not flush to stdout before the process dies.

This happens on every single deploy. The bugs are intermittent and hard to reproduce.

The Shutdown Sequence

javascript
// src/shutdown.js
import { server } from './server.js'
import { pool } from './db.js'
import { worker } from './queue.js'
import logger from './logger.js'

let isShuttingDown = false

async function shutdown(signal) {
  if (isShuttingDown) return
  isShuttingDown = true
  
  logger.info({ signal }, 'Shutdown initiated')
  
  // Step 1: Stop accepting new connections
  // server.close() stops accepting new connections but waits for existing ones to finish
  await new Promise((resolve, reject) => {
    server.close((err) => {
      if (err) reject(err)
      else resolve()
    })
    
    // Force close after 10s if connections don't drain
    setTimeout(() => {
      logger.warn('Forcing connection close after timeout')
      resolve()
    }, 10_000)
  })
  
  logger.info('HTTP server closed')
  
  // Step 2: Stop BullMQ workers (finish current job, don't start new ones)
  await worker.close()
  logger.info('BullMQ worker stopped')
  
  // Step 3: Close database pool (waits for active queries to complete)
  await pool.end()
  logger.info('Database pool closed')
  
  // Step 4: Flush logs (Pino batches writes — flush before exit)
  await new Promise((resolve) => logger.flush(resolve))
  
  process.exit(0)
}

process.on('SIGTERM', () => shutdown('SIGTERM'))
process.on('SIGINT', () => shutdown('SIGINT'))   // Ctrl+C in development

// Catch unhandled promise rejections — don't silently swallow them
process.on('unhandledRejection', (reason, promise) => {
  logger.error({ reason, promise }, 'Unhandled promise rejection')
  shutdown('unhandledRejection')
})

Kubernetes `terminationGracePeriodSeconds` Alignment

Kubernetes's default terminationGracePeriodSeconds is 30 seconds. Your application's total shutdown time must fit within this window. If your shutdown sequence takes 35 seconds (slow database drain), Kubernetes sends SIGKILL after 30 seconds, cutting your shutdown short.

yaml
# k8s deployment.yaml
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60  # give the app 60s to shut down gracefully
      containers:
        - name: api
          lifecycle:
            preStop:
              exec:
                # Optional: sleep 5s before SIGTERM so load balancer removes pod from rotation first
                command: ["/bin/sleep", "5"]

The preStop hook runs before SIGTERM. The sleep gives the load balancer (which polls health checks every few seconds) time to mark the pod as "not ready" and stop routing traffic to it before the shutdown begins. Without this, in-flight requests can still arrive during shutdown.

Draining Active HTTP Connections

server.close() stops accepting new connections but waits for existing keep-alive connections to close on their own. Keep-alive connections can stay open indefinitely. Add explicit connection draining:

javascript
import http from 'node:http'

const connections = new Set()

server.on('connection', (socket) => {
  connections.add(socket)
  socket.on('close', () => connections.delete(socket))
})

async function closeServer() {
  // Stop accepting new connections
  server.close()
  
  // Destroy all open keep-alive connections
  for (const socket of connections) {
    socket.destroy()
  }
  
  connections.clear()
}

For graceful HTTP/2 draining (Fastify, h2), use the framework's built-in close method which handles HTTP/2 GOAWAY frames properly.

Testing Your Shutdown Handler

bash
# Simulate Kubernetes SIGTERM
kill -TERM $(pgrep -f "node src/index.js")

# Watch logs for the shutdown sequence
# Should see: "Shutdown initiated" → "HTTP server closed" → "Worker stopped" → "DB closed"
# Should NOT see: "SIGKILL" or abrupt process exit

Load test during shutdown to verify no requests are dropped:

bash
# Terminal 1: start load test
hey -z 30s -c 10 http://localhost:3000/api/health

# Terminal 2: send SIGTERM during the load test
sleep 5 && kill -TERM $(pgrep -f "node src/index.js")

# Expected: 0 failed requests in hey output (graceful drain)
# Bad: >0 failed requests (connections severed mid-flight)

PreviousModule P-13: JSON Internals, Serialization, and Schema Validation Next Module P-15: Memory Leak Prevention Patterns