Back/Module P-5 JSON and JSONB — Working With Semi-Structured Data
Module P-5·18 min read

JSONB vs JSON, containment operators, GIN indexes, and when JSONB is the right tool vs. a schema design shortcut.

P-5 — JSON and JSONB: Working With Semi-Structured Data

Who this module is for: You have normalised relational schemas down. Sometimes, though, data does not fit neatly into fixed columns — product attributes vary by category, user preferences are open-ended, API responses have unpredictable shapes. PostgreSQL's JSONB gives you a relational database that also handles document-style data. This module covers when to reach for it, how to query it efficiently, and when using it is a mistake that will cost you later.


JSON vs JSONB — Choose JSONB

PostgreSQL has two JSON types:

JSON — stores the raw text of the JSON document, preserving whitespace, duplicate keys, and key ordering. Validation only on insert; no indexing support. Slower to process because it re-parses on every operation.

JSONB — stores JSON in a decomposed binary format. Duplicate keys are removed (last value wins), key ordering is not preserved, but: supports indexing, supports all operators, and processes faster. Use JSONB for everything.

sql
-- JSONB is almost always the right choice CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL, attributes JSONB -- variable product attributes );

Inserting JSONB

sql
-- Insert JSON objects directly INSERT INTO products (name, attributes) VALUES ('Laptop', '{"ram_gb": 16, "storage_gb": 512, "color": "silver", "ports": ["USB-C", "HDMI"]}'), ('T-Shirt', '{"size": "M", "color": "blue", "material": "cotton", "tags": ["casual", "summer"]}'), ('Headphones', '{"wireless": true, "battery_hours": 30, "color": "black"}'); -- Using jsonb_build_object for programmatic construction INSERT INTO products (name, attributes) VALUES ( 'Monitor', jsonb_build_object( 'resolution', '4K', 'refresh_rate', 60, 'hdr', true, 'connections', jsonb_build_array('HDMI', 'DisplayPort', 'USB-C') ) );

Querying JSONB — The Operator Reference

Extracting values

sql
-- -> returns JSONB (useful for nested objects/arrays) SELECT attributes -> 'color' FROM products; -- Returns: "silver" (as JSONB with quotes) -- "blue" -- "black" -- null (Monitor has no 'color' key) -- ->> returns TEXT (useful for comparisons and display) SELECT attributes ->> 'color' FROM products; -- Returns: silver (as plain text, no quotes) -- blue -- black -- (null) -- Navigate nested paths SELECT attributes -> 'specs' -> 'cpu' FROM products; -- nested JSONB SELECT attributes #> '{specs, cpu}' FROM products; -- path array syntax SELECT attributes #>> '{specs, cpu}' FROM products; -- as TEXT -- Array element by index (0-based) SELECT attributes -> 'ports' -> 0 FROM products; -- first port SELECT attributes -> 'tags' ->> 1 FROM products; -- second tag as text

Filtering with JSONB operators

sql
-- @> containment: does the JSONB contain this sub-document? SELECT name FROM products WHERE attributes @> '{"color": "black"}'; -- Returns: Headphones (has color=black) SELECT name FROM products WHERE attributes @> '{"wireless": true}'; -- Returns: Headphones -- Filter by array element SELECT name FROM products WHERE attributes @> '{"ports": ["HDMI"]}'; -- Returns: Laptop (its ports array contains "HDMI") -- ? key exists SELECT name FROM products WHERE attributes ? 'wireless'; -- Returns products that have a 'wireless' key -- ?| any key exists SELECT name FROM products WHERE attributes ?| ARRAY['wireless', 'hdr']; -- Returns products that have 'wireless' OR 'hdr' -- ?& all keys exist SELECT name FROM products WHERE attributes ?& ARRAY['color', 'ram_gb']; -- Returns products that have BOTH 'color' AND 'ram_gb' -- Compare extracted value as text SELECT name FROM products WHERE (attributes ->> 'battery_hours')::INTEGER > 20; -- Cast to INTEGER for numeric comparison

Modifying JSONB

sql
-- Add or update a key UPDATE products SET attributes = attributes || '{"in_stock": true}' WHERE name = 'Laptop'; -- || merges two JSONB objects; right side wins on key conflicts -- Set a nested value UPDATE products SET attributes = jsonb_set(attributes, '{specs, ram_gb}', '32') WHERE name = 'Laptop'; -- jsonb_set(target, path_array, new_value) -- Remove a key UPDATE products SET attributes = attributes - 'color' WHERE name = 'T-Shirt'; -- - operator removes a key -- Remove a nested key UPDATE products SET attributes = attributes #- '{specs, old_field}' WHERE name = 'Laptop'; -- Add an element to an array UPDATE products SET attributes = jsonb_set( attributes, '{ports}', (attributes -> 'ports') || '"Thunderbolt"' ) WHERE name = 'Laptop';

Building JSON from Relational Data

sql
-- Build a JSON object from columns SELECT jsonb_build_object( 'id', id, 'name', name, 'price', price ) FROM products; -- Build a JSON array of objects (aggregation) SELECT jsonb_agg( jsonb_build_object('id', id, 'name', name) ) AS products_json FROM products WHERE in_stock = true; -- row_to_json: convert an entire row to JSON SELECT row_to_json(products.*) FROM products; -- Build nested JSON with joins SELECT c.name AS customer, jsonb_build_object( 'total_orders', COUNT(o.id), 'total_spent', SUM(o.total), 'recent_orders', jsonb_agg( jsonb_build_object('id', o.id, 'total', o.total) ORDER BY o.created_at DESC ) FILTER (WHERE o.id IS NOT NULL) ) AS stats FROM customers c LEFT JOIN orders o ON c.id = o.customer_id GROUP BY c.id, c.name;

GIN Indexes — Making JSONB Queries Fast

Without an index, every JSONB query scans every row. With a GIN index, containment (@>) and key existence (?, ?|, ?&) queries are fast.

sql
-- GIN index on the entire JSONB column (jsonb_ops — default) CREATE INDEX idx_products_attributes ON products USING GIN (attributes); -- Now these queries use the index: SELECT * FROM products WHERE attributes @> '{"color": "black"}'; SELECT * FROM products WHERE attributes ? 'wireless'; -- GIN index with jsonb_path_ops (smaller, only supports @> operator) CREATE INDEX idx_products_attributes_path ON products USING GIN (attributes jsonb_path_ops); -- Smaller index, faster @> queries, but does not support ? operators

GIN index with specific path (index only one field — more efficient):

sql
-- If you always query by color, index just that path CREATE INDEX idx_products_color ON products ((attributes ->> 'color')); -- Regular B-tree index on the extracted text value -- Efficient for: SELECT * FROM products WHERE attributes ->> 'color' = 'black'; -- Does NOT help with @> containment

JSONB Query Path Language (jsonpath)

PostgreSQL 12+ includes jsonpath — a mini-language for complex JSONB queries:

sql
-- Find products where any port is 'HDMI' SELECT name FROM products WHERE attributes @? '$.ports[*] ? (@ == "HDMI")'; -- Find products with battery > 20 hours SELECT name FROM products WHERE attributes @? '$.battery_hours ? (@ > 20)'; -- Extract matching elements SELECT jsonb_path_query(attributes, '$.ports[*]') AS port FROM products WHERE name = 'Laptop'; -- Returns each port as a separate row

When JSONB Is the Right Tool

Use JSONB when:

  1. The schema is genuinely variable — product attributes differ by category (electronics have voltage, clothing has size/material), and you cannot enumerate all attributes upfront

  2. Semi-structured external data — storing API responses, webhook payloads, or log data where the shape is not fully under your control

  3. Infrequently queried sub-attributes — metadata that you store for display but rarely filter on

  4. Rapid prototyping — the schema is not yet stable; JSONB buys time to define it

sql
-- Good use of JSONB: product metadata with variable structure CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL, price NUMERIC(10,2) NOT NULL, category TEXT NOT NULL, metadata JSONB -- category-specific attributes ); -- Electronics: {"voltage": 110, "wattage": 65, "warranty_years": 2} -- Clothing: {"size": "M", "material": "cotton", "care": "machine wash"} -- Books: {"isbn": "978-...", "pages": 320, "publisher": "O'Reilly"}

When JSONB Is the Wrong Tool

Do NOT use JSONB when:

  1. The data is structured and you query individual fields — if you always WHERE attributes->>'status' = 'active', that should be a proper TEXT NOT NULL column with a B-tree index

  2. You need referential integrity — JSONB values cannot be foreign keys; you cannot REFERENCES a value inside a JSONB document

  3. You need constraints on sub-fields — you cannot add CHECK (attributes->>'price' > 0) in a meaningful way; use a real column

  4. The data is always the same shape — if every row has {"first_name": "...", "last_name": "..."}, that is two columns, not a JSONB object

sql
-- ❌ BAD: using JSONB to avoid thinking about schema CREATE TABLE users ( id BIGSERIAL PRIMARY KEY, data JSONB -- {"email": "...", "name": "...", "role": "...", "created_at": "..."} ); -- No NOT NULL constraints, no UNIQUE on email, no type safety -- Every query must extract from JSONB instead of using columns -- ✅ GOOD: use proper columns for structured data CREATE TABLE users ( id BIGSERIAL PRIMARY KEY, email TEXT NOT NULL UNIQUE, name TEXT NOT NULL, role TEXT NOT NULL DEFAULT 'user', created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), preferences JSONB -- only truly variable user preferences );

Practical Exercise: Product Catalog with Variable Attributes

sql
CREATE TABLE catalog_products ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL, category TEXT NOT NULL, base_price NUMERIC(10,2) NOT NULL, attributes JSONB NOT NULL DEFAULT '{}' ); CREATE INDEX idx_catalog_attributes ON catalog_products USING GIN (attributes); INSERT INTO catalog_products (name, category, base_price, attributes) VALUES ('iPhone 15', 'phones', 999.00, '{"storage_gb": 128, "color": "black", "5g": true, "os": "iOS 17"}'), ('Galaxy S24', 'phones', 899.00, '{"storage_gb": 256, "color": "white", "5g": true, "os": "Android 14"}'), ('MacBook Air', 'laptops', 1299.00, '{"ram_gb": 16, "storage_gb": 512, "chip": "M3", "color": "silver"}'), ('Blue Jeans', 'clothing', 79.00, '{"size": "32x32", "color": "blue", "material": "denim", "fit": "slim"}'); -- Q1: All 5G phones SELECT name, base_price FROM catalog_products WHERE category = 'phones' AND attributes @> '{"5g": true}'; -- Q2: Products available in black SELECT name, category FROM catalog_products WHERE attributes @> '{"color": "black"}'; -- Q3: Laptops with 16GB RAM SELECT name, base_price FROM catalog_products WHERE category = 'laptops' AND (attributes ->> 'ram_gb')::INTEGER >= 16; -- Q4: All unique colors across all products SELECT DISTINCT attributes ->> 'color' AS color FROM catalog_products WHERE attributes ? 'color' ORDER BY color; -- Q5: Products grouped by OS SELECT attributes ->> 'os' AS os, COUNT(*) AS count, jsonb_agg(name) AS products FROM catalog_products WHERE attributes ? 'os' GROUP BY attributes ->> 'os';

Summary

ConceptKey Takeaway
JSON vs JSONBAlways use JSONB — indexable, faster processing
->Extract as JSONB (preserves type for nested access)
->>Extract as TEXT (use for comparisons)
@>Containment check — main operator for JSONB filtering
? / `?/?&`
`
jsonb_set()Update a nested value without replacing entire document
GIN indexRequired for fast @> and ? queries on large tables
jsonb_build_object()Construct JSONB in queries
jsonb_agg()Aggregate rows into a JSON array
Good use casesVariable attributes, external API data, open-ended metadata
Bad use casesStructured data you always query, data needing FK constraints or CHECK constraints

Module P-6 covers access control — how to set up roles with the right permissions, and how Row-Level Security policies make data isolation automatic at the database level.

Next: P-6 — Authentication, Row-Level Security, and Access Control →

Discussion