Module A-2·27 min read

DNS → CDN → edge → origin → streaming response anatomy, cold start cost breakdown, ISR revalidation internals, instrumentation.ts register() lifecycle, and instrumentation-client.js for browser SDK boot.

A-2 — The Full Request Lifecycle and Edge Infrastructure

Who this is for: Architects who need to understand every hop a request takes from the user's browser to your database and back — and what happens at each layer. This is the module that explains why a page can have a 5ms TTFB or a 800ms TTFB depending on where in the infrastructure the request gets handled.

The Complete Request Path

A request to a Next.js application in production touches up to five distinct layers. Understanding each one tells you where your latency comes from and where to intervene.

Browser
  → DNS resolution (cached after first request)
  → CDN edge node (geographically nearest to user)
      → Cache HIT: return immediately (0-5ms TTFB)
      → Cache MISS: forward to origin
          → Next.js edge middleware (V8 isolate, ~1ms)
          → Next.js server (Node.js)
              → Full Route Cache HIT: return pre-rendered HTML
              → Full Route Cache MISS: render
                  → Data Cache / React.cache() / use cache
                  → Database / external APIs
                  ← Streaming response (chunked transfer encoding)
          ← Response with cache headers
      ← Forward to browser
  ← Progressive HTML rendering

A static page (○) served from CDN cache has a TTFB of 5-30ms globally. A dynamic page (λ) that hits the database has a TTFB of 100-500ms depending on database location, query complexity, and connection pool state. The difference between those two numbers is the entire caching architecture.

What the CDN Layer Does

The CDN sits in front of your Next.js server. For static routes, the CDN stores the pre-rendered HTML and RSC payloads — every request for that route gets served directly from the CDN node nearest the user. The Next.js server is not involved at all.

Next.js communicates with the CDN through HTTP cache headers it generates automatically:

Cache-Control: public, max-age=31536000, immutable
# ↑ Static assets (JS, CSS, fonts) — cache forever, version via hash in filename

Cache-Control: s-maxage=31536000, stale-while-revalidate
# ↑ Static HTML pages — cache on CDN, revalidate in background when stale

Cache-Control: private, no-cache, no-store, max-age=0, must-revalidate
# ↑ Dynamic pages — CDN does not cache, every request hits origin

On Vercel, this happens automatically — Vercel's CDN understands Next.js's cache model. On self-hosted deployments behind Nginx or Cloudflare, you need to either pass these headers through to the CDN or configure CDN caching rules explicitly.

Cold Start Anatomy

In serverless environments (Vercel Functions, AWS Lambda), your Node.js process doesn't exist until someone requests it. The cold start is the time it takes to create the process, load your application code, and handle the first request.

Cold start breakdown:

1. Container provisioning         ~50-200ms  (platform overhead — not your code)
2. Node.js runtime init           ~20-50ms
3. Module loading (require/import) ~50-300ms  ← your biggest lever
4. Prisma client initialisation   ~20-100ms
5. First request handling         your actual code

Module loading is where your application has the most control. The more code you import at the top level, the longer the cold start.

Strategies to reduce cold start:

ts
// ❌ Importing everything at the top level
import Stripe from 'stripe';
import { Resend } from 'resend';
import { S3Client } from '@aws-sdk/client-s3';

// ✅ Dynamic imports for infrequently used heavy modules
export async function handleStripeWebhook(event: StripeEvent) {
  const { default: Stripe } = await import('stripe');
  // Stripe only loads when this function actually runs
}

ts
// ❌ Prisma client created fresh each cold start
export async function GET() {
  const prisma = new PrismaClient();
  const data = await prisma.posts.findMany();
  return Response.json(data);
}

// ✅ Singleton — initialised once, reused across warm invocations
import { db } from '@/lib/db'; // the globalThis singleton from P-4

On Vercel, you can see cold start frequency in the Functions tab. A rate of >5% cold starts on production traffic warrants optimisation. For critical paths (login, checkout), consider warming strategies: scheduled pings every 5 minutes, or Vercel's Fluid compute (always-warm instances).

Streaming Response Anatomy

When a Next.js page uses Suspense and has async Server Components, the response is streamed using HTTP chunked transfer encoding. This is worth understanding mechanically because it's what makes Next.js pages feel fast even when data is slow.

The server sends the HTTP headers immediately (including Transfer-Encoding: chunked). Then it sends HTML in chunks as content becomes available:

--- Initial flush (immediate) ---
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html

<!DOCTYPE html>
<html>
<head>...</head>
<body>
  <nav>...</nav>              ← layout renders immediately
  <h1>Product Title</h1>     ← static product data renders immediately
  <!--$?--><template id="B:0"></template><!--/$-->  ← Suspense placeholder
</body>

--- Second chunk (when ProductReviews resolves, ~300ms later) ---
<div hidden id="S:0">
  <ul class="reviews">...</ul>   ← real reviews content
</div>
<script>
  $RC("B:0", "S:0")   ← React's built-in function swaps placeholder with content
</script>

The browser renders the first chunk immediately — the user sees the page layout and product title while reviews are still loading. When the second chunk arrives, React's $RC function (from the streaming runtime) moves the content into the Suspense placeholder without a full re-render.

This is fundamentally different from a client-side skeleton loader. The skeleton loader delivers an empty page, then makes a separate fetch for data, then renders. Streaming delivers real server-rendered content progressively — the first paint already has meaningful content.

ISR Revalidation Internals

ISR (Incremental Static Regeneration) uses a stale-while-revalidate model. Understanding exactly how the timing works prevents a common mistake: thinking revalidation happens on a schedule.

ISR does not run on a schedule. It runs on request, with a cooldown.

Timeline for a page with revalidate: 60:

t=0    First request: page renders, result cached. Timestamp recorded.
t=30   Request: cache age = 30s < 60s. Serve cached version. No revalidation.
t=61   Request: cache age = 61s > 60s. Serve cached version (stale). Background revalidation starts.
t=61   Background: page re-renders, new result stored, timestamp updated.
t=62   Request: new cache hit. Fresh content served.

The user at t=61 gets the stale page — there is no way around this in ISR. The stale serve is intentional: users get a response immediately without waiting for revalidation. The background rerender only affects the next user.

Tag-based ISR changes the trigger: instead of time-based, the revalidation fires when you call revalidateTag(). The next request after the tag invalidation re-renders and re-caches. Same stale-first model, but the staleness is bounded by your mutation cadence rather than a fixed TTL.

The `instrumentation.ts` Hook

instrumentation.ts (at the project root) exports a register() function that Next.js calls once when the server starts — before the first request is handled. It's the correct place for:

OpenTelemetry provider setup
Sentry initialisation
Database connection pool warm-up
Custom APM agent registration

ts
// instrumentation.ts

export async function register() {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    // Only runs in Node.js runtime, not edge runtime
    const { NodeSDK } = await import('@opentelemetry/sdk-node');
    const { getNodeAutoInstrumentations } = await import('@opentelemetry/auto-instrumentations-node');

    const sdk = new NodeSDK({
      traceExporter: /* your exporter */,
      instrumentations: [getNodeAutoInstrumentations()],
    });

    sdk.start();
  }

  if (process.env.NEXT_RUNTIME === 'edge') {
    // Edge-specific initialisation
  }
}

The NEXT_RUNTIME environment variable tells you which runtime is executing: 'nodejs' for the main server, 'edge' for Middleware and edge Route Handlers. Use it to guard runtime-specific code.

instrumentation.ts is not Middleware. Middleware runs on every request. register() runs once at startup. Don't confuse them.

Enable it (required in Next.js 14, automatic in 15):

ts
// next.config.ts
const config: NextConfig = {
  experimental: {
    instrumentationHook: true, // Required in Next.js 14, automatic in 15
  },
};

`instrumentation-client.js` — The Browser Mirror

instrumentation-client.js (or .ts) is the client-side equivalent: it runs once in the browser when the Next.js application boots, before any page component mounts.

ts
// instrumentation-client.ts

export function register() {
  // Analytics initialisation — runs once on browser boot
  if (typeof window !== 'undefined') {
    import('@segment/analytics-next').then(({ AnalyticsBrowser }) => {
      const analytics = AnalyticsBrowser.load({ writeKey: process.env.NEXT_PUBLIC_SEGMENT_KEY! });
      window.__analytics = analytics;
    });
  }
}

// Runs on every client-side navigation (the SPA page view)
export function onRouteChange(url: string, { previousUrl }: { previousUrl: string }) {
  window.__analytics?.page({ path: url, referrer: previousUrl });
}

onRouteChange fires on every App Router client-side navigation — this is how you track page views in a Next.js SPA without useEffect-based tracking in every page component.

The two exports together replace the previous pattern of putting analytics initialisation in _app.tsx (Pages Router) or in a Client Component that wraps the root layout.

Edge vs Node.js Runtime for Route Handlers

Route Handlers run in the Node.js runtime by default. You can opt individual handlers into the edge runtime:

ts
// app/api/fast-check/route.ts
export const runtime = 'edge';

export async function GET(request: Request) {
  // Runs in a V8 isolate — no Node.js APIs, near-zero cold start
  const country = request.headers.get('x-vercel-ip-country') ?? 'unknown';
  return Response.json({ country });
}

Edge runtime advantages:

Near-zero cold start (V8 isolate, not a full Node.js process)
Runs at CDN edge globally — geographically close to users
Ideal for simple, fast responses: geo-detection, auth checks, feature flag evaluations

Edge runtime limitations:

No Node.js APIs (fs, crypto, native modules)
No Prisma (uses Node.js internals)
No bcrypt, no most Node.js-based libraries
1MB bundle size limit
Neon's HTTP driver and Upstash Redis work at the edge; most other databases don't

The decision: default to Node.js runtime. Move to edge only when the handler is simple enough to work without Node.js APIs and you need the latency benefit.

Where We Go From Here

A-3 goes deep into the caching architecture internals — the mechanics of the tag registry, cache stampede and how to prevent it, the use cache directive's static analysis, custom cache handlers for distributed deployments. With A-1's understanding of Flight and A-2's understanding of the infrastructure, A-3 explains how data moves through the system and stays fresh.

PreviousModule A-1: RSC Internals — The React Flight Protocol and React 19 Next Module A-3: The Caching Architecture — Deep Internals