Module A-14·27 min read

Standalone output internals, static export feature graveyard (what silently breaks), WebSocket authentication (HttpOnly cookie on upgrade handshake, token validation on reconnect, secret rotation with 40K live connections), Pusher/Ably vs Partykit vs self-hosted socket layer, and the cold start optimisation playbook.

A-14 — Self-Hosting vs Serverless, WebSockets, and Long-Lived Connections

Who this is for: Architects making infrastructure decisions for Next.js applications that have outgrown the "just deploy to Vercel" answer — teams self-hosting on Kubernetes, applications that need WebSockets or Server-Sent Events, and anyone who needs to understand the real constraints that serverless imposes on connection-oriented features.

The Fundamental Serverless Constraint

Serverless (Vercel Functions, AWS Lambda, Cloudflare Workers) is the dominant deployment model for Next.js because it matches the request-per-invocation model well. A request comes in, the function runs, the response goes out. For stateless HTTP, this is ideal.

The constraint: serverless functions don't persist between requests. Each invocation may run in a new container. There's no in-process state, no long-lived connections, no background threads.

This constraint rules out a specific category of features:

WebSocket servers (require persistent TCP connections)
Server-Sent Events that outlive a response (require persistent HTTP connections)
In-memory job queues (state lost on restart)
Background processing beyond the request duration
In-process caches shared across requests (each instance has its own memory)

These are not framework limitations — they're infrastructure physics. The question is which of these your application actually needs, and what the alternatives are.

WebSockets in a Next.js Application

Next.js doesn't have built-in WebSocket support. The Route Handler model (request → response) doesn't map to WebSockets (persistent bidirectional connection). This is not a missing feature — it's an architectural mismatch.

The three architectural patterns for WebSockets alongside Next.js:

Pattern 1: Separate WebSocket server

The most common production pattern. A dedicated Node.js service (Socket.io, ws, uWebSockets.js) handles WebSocket connections. The Next.js application communicates with it via HTTP or a message broker.

Browser
  ├── HTTP requests → Next.js (Vercel/serverless)
  └── WebSocket → ws.example.com (always-on Node.js service)

tsx
// components/ChatRoom.tsx
'use client';

import { useEffect, useRef } from 'react';

export function ChatRoom({ roomId }: { roomId: string }) {
  const wsRef = useRef<WebSocket | null>(null);
  
  useEffect(() => {
    const ws = new WebSocket(`wss://ws.example.com/room/${roomId}`);
    wsRef.current = ws;
    
    ws.onmessage = (event) => {
      const message = JSON.parse(event.data);
      // update state
    };
    
    return () => ws.close();
  }, [roomId]);
  
  return <div>...</div>;
}

Pattern 2: Partykit / Cloudflare Durable Objects

Cloudflare Durable Objects are stateful edge workers — they persist state and accept WebSocket connections. Partykit is a higher-level abstraction built on Durable Objects.

ts
// party/chat.ts (Partykit server)
import type * as Party from 'partykit/server';

export default class ChatRoom implements Party.Server {
  messages: string[] = [];
  
  onConnect(conn: Party.Connection) {
    conn.send(JSON.stringify({ type: 'history', messages: this.messages }));
  }
  
  onMessage(message: string) {
    this.messages.push(message);
    this.party.broadcast(message); // send to all connected clients
  }
}

This runs entirely at the edge — WebSocket connections go to the nearest Cloudflare datacenter, not a central origin server. For real-time applications with global users, this is a compelling architecture.

Pattern 3: Serverless WebSockets (Pusher, Ably, Soketi)

Managed WebSocket infrastructure as a service. Your Next.js application publishes events to the service via HTTP. The service maintains WebSocket connections to clients and delivers events.

ts
// Server Action publishes an event
'use server';
import Pusher from 'pusher';

const pusher = new Pusher({
  appId: process.env.PUSHER_APP_ID!,
  key: process.env.PUSHER_KEY!,
  secret: process.env.PUSHER_SECRET!,
  cluster: process.env.PUSHER_CLUSTER!,
});

export async function sendMessage(roomId: string, message: string) {
  await pusher.trigger(`room-${roomId}`, 'new-message', { message });
}

tsx
// Client subscribes via Pusher JS SDK
'use client';
import Pusher from 'pusher-js';

useEffect(() => {
  const pusher = new Pusher(process.env.NEXT_PUBLIC_PUSHER_KEY!, {
    cluster: process.env.NEXT_PUBLIC_PUSHER_CLUSTER!,
  });
  
  const channel = pusher.subscribe(`room-${roomId}`);
  channel.bind('new-message', (data: { message: string }) => {
    // handle message
  });
  
  return () => pusher.disconnect();
}, [roomId]);

The trade-off: operational simplicity (no WebSocket server to manage) vs. cost (per-message pricing at scale) and latency (extra hop through the managed service).

Server-Sent Events

Server-Sent Events (SSE) are a lighter-weight alternative to WebSockets for one-directional server-to-client streaming. They use regular HTTP, work through most proxies, and automatically reconnect.

In Next.js Route Handlers, SSE works on self-hosted deployments but not on Vercel Functions — Vercel Functions have a maximum response duration, and SSE requires a persistent connection:

ts
// app/api/events/route.ts
export async function GET() {
  const stream = new ReadableStream({
    start(controller) {
      const encoder = new TextEncoder();
      
      // Send an event every second
      const interval = setInterval(() => {
        const data = `data: ${JSON.stringify({ timestamp: Date.now() })}\n\n`;
        controller.enqueue(encoder.encode(data));
      }, 1000);
      
      // Clean up on client disconnect
      return () => clearInterval(interval);
    },
  });
  
  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  });
}

On Vercel, SSE responses are limited to Vercel's streaming response duration limit. For indefinite streams, you need self-hosted or a managed SSE service.

Self-Hosting on Kubernetes

For teams that need persistent connections, long-lived processes, or full infrastructure control, Kubernetes is the production target.

The canonical self-hosted architecture:

Kubernetes Cluster
├── Next.js Deployment (3-5 replicas)
│   ├── Port 3000 — HTTP server (standalone build)
│   └── Horizontal Pod Autoscaler — scale on CPU/request rate
├── WebSocket Service (separate Deployment)
│   ├── Port 8080 — WebSocket connections
│   └── Sticky sessions (for stateful connections)
├── Redis (shared cache, session store)
├── PostgreSQL (or RDS, Cloud SQL)
└── Ingress (nginx or Traefik)
    ├── / → Next.js service
    └── /ws → WebSocket service

The key difference from serverless: pods persist between requests. In-process state (Prisma connection pools, warm Module singletons) survives across requests. This is why the Prisma globalThis singleton pattern (P-4) matters — on Kubernetes, the singleton is reused across thousands of requests per pod. On serverless, the singleton is created anew each cold start.

Kubernetes deployment manifest (simplified):

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nextjs-app
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: nextjs
          image: my-app:latest
          ports:
            - containerPort: 3000
          env:
            - name: NODE_ENV
              value: production
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: database-url
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "1000m"
          livenessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 30
          readinessProbe:
            httpGet:
              path: /api/health
              port: 3000

The livenessProbe and readinessProbe point to the health check Route Handler from P-14. Kubernetes uses these to determine which pods should receive traffic and whether to restart unhealthy pods.

ISR on Self-Hosted Kubernetes

ISR on Kubernetes has the distributed cache invalidation problem from A-3 — each pod has its own filesystem cache. revalidatePath('/products') on Pod A doesn't invalidate Pod B.

The solution: Redis-backed cache handler (also from A-3), or disable ISR's filesystem cache entirely and use use cache with a remote cache instead.

ts
// next.config.ts
const config: NextConfig = {
  cacheHandler: require.resolve('./lib/cache-handler'),
  cacheMaxMemorySize: 0,
};

With a Redis cache handler, all pods share the same cache. Revalidation from any pod propagates to all pods via Redis.

Choosing the Right Deployment Model

Requirement	Serverless (Vercel)	Self-hosted (K8s)
Zero infrastructure ops	✅	❌
WebSockets	❌ (use separate service)	✅
Long-running background jobs	❌	✅
Per-request isolation	✅	✅ (separate pods)
Global edge distribution	✅ (built-in)	✅ (needs CDN setup)
ISR with multiple instances	✅ (Vercel handles it)	Requires Redis handler
Predictable cost at high volume	❌ (per-request pricing)	✅ (fixed pod cost)
Cold starts	Present (mitigatable)	None

The practical decision: start with Vercel. Migrate specific services (WebSockets, background processing) to dedicated always-on infrastructure when you hit the serverless constraints. Only migrate the entire Next.js application to Kubernetes if you have specific reasons — usually cost at very high volume or regulatory requirements around where compute runs.

Where We Go From Here

A-15 — the final module — covers production observability: the tracing, metrics, logging, and alerting architecture that lets you understand what your application is doing in production, diagnose incidents quickly, and catch regressions before users report them.

WebSocket Authentication — The Part Nobody Documents

The deployment topology for WebSockets in production (covered earlier in this module) is clear. The authentication story is not — and it fails in ways that are hard to debug because they're browser-specific and often silent.

Your application authenticates via HttpOnly cookies. The browser sends cookies on every HTTP request automatically. When the browser opens a WebSocket connection, it sends an HTTP GET with an Upgrade: websocket header. Cookies are included in this upgrade request — so far so good.

The problem: the HTTP 101 Switching Protocols response establishes the WebSocket connection. After that, there's no more HTTP. The WebSocket protocol has no built-in mechanism to re-send cookies on subsequent messages. The authentication happens once, at connection time.

What this means operationally:

If the session cookie expires while the socket is open, the server has no way to know until the client makes a new HTTP request. The WebSocket connection stays open with an expired session.
If the user logs out (session cookie is deleted), the WebSocket connection stays open. The client keeps receiving real-time updates for a user who has logged out.
Some environments (certain mobile browsers, proxy servers, load balancers) strip cookies from the WebSocket upgrade request. The connection goes through but arrives unauthenticated.

Authentication Pattern: Token in Initial Message

The most reliable approach: don't rely on the upgrade request for authentication. Complete the HTTP handshake, then require the client to send an auth message as the first message over the socket.

ts
// server/websocket-server.ts (separate Node.js service)
import { WebSocketServer, WebSocket } from 'ws'
import { verifyToken } from './lib/auth'

const wss = new WebSocketServer({ port: 3001 })

wss.on('connection', (ws) => {
  let userId: string | null = null
  let authTimeout: ReturnType<typeof setTimeout>
  
  // Require authentication within 5 seconds
  authTimeout = setTimeout(() => {
    if (!userId) {
      ws.close(4001, 'Authentication timeout')
    }
  }, 5000)
  
  ws.on('message', async (data) => {
    const message = JSON.parse(data.toString())
    
    // First message must be auth
    if (!userId) {
      if (message.type !== 'auth') {
        ws.close(4002, 'First message must be auth')
        return
      }
      
      try {
        const payload = await verifyToken(message.token)
        userId = payload.sub
        clearTimeout(authTimeout)
        ws.send(JSON.stringify({ type: 'auth_ok', userId }))
      } catch {
        ws.close(4003, 'Invalid token')
      }
      return
    }
    
    // Handle authenticated messages
    handleMessage(ws, userId, message)
  })
  
  ws.on('close', () => {
    clearTimeout(authTimeout)
    // Clean up any subscriptions for this userId
  })
})

The client sends a short-lived access token (not the HttpOnly session cookie, which isn't accessible to JavaScript). Generate this token specifically for WebSocket authentication:

ts
// app/api/ws-token/route.ts — Route Handler that issues a short-lived WS token
export async function GET() {
  const session = await auth()
  if (!session) return new Response('Unauthorized', { status: 401 })
  
  // Short-lived token — 60 seconds, enough to complete the WS handshake
  const wsToken = await signJWT(
    { sub: session.user.id, scope: 'websocket' },
    { expiresIn: '60s' }
  )
  
  return Response.json({ token: wsToken })
}

ts
// Client-side WebSocket connection
async function connectWebSocket() {
  // Get a short-lived token from your server
  const { token } = await fetch('/api/ws-token').then(r => r.json())
  
  const ws = new WebSocket('wss://ws.yourapp.com')
  
  ws.onopen = () => {
    // First message: authenticate
    ws.send(JSON.stringify({ type: 'auth', token }))
  }
  
  ws.onmessage = (event) => {
    const msg = JSON.parse(event.data)
    if (msg.type === 'auth_ok') {
      // Now we can start sending/receiving application messages
      subscribeToUpdates(ws)
    }
  }
}

Token Validation on Reconnect

WebSocket connections drop. Networks change. Mobile devices sleep. Your client reconnects every time. Each reconnect goes through the same auth flow: fetch a new short-lived token, send it as the first message.

The short-lived token model (60s expiry) ensures that reconnections always use a fresh token — no stale token can be reused from a previous connection session.

ts
// Client reconnect logic with exponential backoff
function createReconnectingWebSocket(url: string) {
  let ws: WebSocket | null = null
  let reconnectDelay = 1000
  let shouldReconnect = true
  
  async function connect() {
    // Always fetch a fresh token before reconnecting
    const { token } = await fetch('/api/ws-token').then(r => r.json())
    
    ws = new WebSocket(url)
    
    ws.onopen = () => {
      reconnectDelay = 1000 // reset backoff on successful connection
      ws!.send(JSON.stringify({ type: 'auth', token }))
    }
    
    ws.onclose = (event) => {
      if (!shouldReconnect) return
      if (event.code === 4003) {
        // Invalid token — likely session expired, redirect to login
        window.location.href = '/login'
        return
      }
      setTimeout(connect, reconnectDelay)
      reconnectDelay = Math.min(reconnectDelay * 2, 30000) // cap at 30s
    }
  }
  
  connect()
  
  return {
    disconnect: () => {
      shouldReconnect = false
      ws?.close()
    },
  }
}

JWT Secret Rotation with Live Connections

This is the scenario that causes the most pain: you need to rotate your JWT signing secret. On HTTP, it's straightforward — new tokens use the new secret, old tokens with the old secret expire naturally within the token TTL. No active user is affected.

On WebSockets, it's different. You have 40,000 active connections. Each was authenticated with a token signed with the old secret. If you delete the old secret, you cannot validate those tokens. All 40,000 connections become effectively unauthenticated on their next message.

The graceful rotation protocol:

Step 1: Add the new secret while keeping the old one. Token validation accepts either.

ts
// lib/auth.ts
const JWT_SECRETS = [
  process.env.JWT_SECRET_NEW!, // primary — used for signing
  process.env.JWT_SECRET_OLD!, // secondary — accepted for validation only
]

export async function verifyToken(token: string) {
  for (const secret of JWT_SECRETS) {
    try {
      return await jwtVerify(token, new TextEncoder().encode(secret))
    } catch {
      continue
    }
  }
  throw new Error('Invalid token')
}

Step 2: Issue a system-wide re-authentication request over the WebSocket itself.

ts
// Send to all connected clients
wss.clients.forEach((client) => {
  if (client.readyState === WebSocket.OPEN) {
    client.send(JSON.stringify({
      type: 'reauth_required',
      reason: 'secret_rotation',
      deadline: Date.now() + 60000, // 60 seconds to re-authenticate
    }))
  }
})

Step 3: Clients receive reauth_required, fetch a new short-lived token (signed with the new secret), and send it as an auth message. The connection continues uninterrupted for clients that re-authenticate.

Step 4: After the deadline, close connections that haven't re-authenticated.

ts
// After 60 seconds
wss.clients.forEach((client) => {
  const state = connectionState.get(client)
  if (state?.tokenSignedWithOldSecret) {
    client.close(4004, 'Re-authentication required')
  }
})

Step 5: Remove the old secret from JWT_SECRETS. Deploy.

This protocol handles secret rotation with zero forced disconnections for clients that are online during the rotation window. Clients that are offline (mobile with bad network) reconnect normally after the rotation — they fetch a new token signed with the new secret.

Per-Connection Rate Limiting

One thing the WebSocket authentication story often misses: rate limiting at the connection and message level.

ts
const connectionCounts = new Map<string, number>() // userId → connection count
const MAX_CONNECTIONS_PER_USER = 5 // limit tab sprawl

wss.on('connection', (ws, req) => {
  // After authentication, check connection limit
  ws.on('message', async (data) => {
    const msg = JSON.parse(data.toString())
    
    if (msg.type === 'auth') {
      const payload = await verifyToken(msg.token)
      const existing = connectionCounts.get(payload.sub) ?? 0
      
      if (existing >= MAX_CONNECTIONS_PER_USER) {
        ws.close(4005, 'Too many connections')
        return
      }
      
      connectionCounts.set(payload.sub, existing + 1)
      // ... rest of auth logic
    }
  })
  
  ws.on('close', () => {
    if (userId) {
      const count = connectionCounts.get(userId) ?? 1
      connectionCounts.set(userId, Math.max(0, count - 1))
    }
  })
})

Without per-user connection limits, a single user (or a compromised account) can open thousands of connections and exhaust your server's file descriptor limit.

PreviousModule A-13: Security Architecture Next Module A-15: Production Observability and Runbooks