Module P-7·23 min read

tsvector, tsquery, GIN indexes, relevance ranking, and generated tsvector columns — search without Elasticsearch.

P-7 — Full-Text Search

The standard answer to "we need search" is Elasticsearch. Spin up a cluster, sync your data, write query DSL, manage index mappings. For large-scale dedicated search, that trade-off sometimes makes sense. For most application search — finding documents, filtering content, search-as-you-type boxes — you already have everything you need inside PostgreSQL.

This module covers PostgreSQL's full-text search system: how it represents and indexes text, how queries work, how to rank results by relevance, and how to set it up so searches stay fast at scale.

The Problem With `LIKE`

The first instinct for search is LIKE:

sql
SELECT * FROM articles WHERE body LIKE '%postgresql performance%';

This works. It also requires a full sequential scan of the table for every search — no index can help with a leading wildcard. At a thousand rows it is fast. At a million rows it is a problem.

ILIKE (case-insensitive) is slower still. And neither handles linguistic variations: a search for "running" will not find articles containing "runs" or "ran."

Full-text search solves both problems: it is indexable and it understands language.

The Two Core Types

PostgreSQL full-text search is built around two data types:

tsvector — a preprocessed representation of a document. It is a sorted list of lexemes (normalized word stems) with their positions in the original text. When you convert text to tsvector, PostgreSQL normalizes words (removing stop words like "the", stemming "running" → "run"), and records where each term appeared.

tsquery — a search query. It is a boolean expression of lexemes that gets matched against a tsvector.

sql
-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
-- Note: "The", "over" removed (stop words), "jumps"→"jump", "lazy"→"lazi"

-- Convert a search query to tsquery
SELECT to_tsquery('english', 'jump & fox');
-- Result: 'jump' & 'fox'

-- Match a tsvector against a tsquery with the @@ operator
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog')
    @@ to_tsquery('english', 'jump & fox');
-- Result: true

The @@ operator is the match operator. It returns true if the tsvector satisfies the tsquery.

Building Queries with `tsquery`

`to_tsquery`

The most explicit form. Uses boolean operators:

sql
-- AND: both terms must be present
SELECT to_tsquery('english', 'postgres & performance');

-- OR: either term
SELECT to_tsquery('english', 'postgres | mysql');

-- NOT: exclude a term
SELECT to_tsquery('english', 'postgres & !mysql');

-- Phrase search (terms must be adjacent, in order)
SELECT to_tsquery('english', 'postgres <-> performance');

-- Proximity (within N words)
SELECT to_tsquery('english', 'postgres <2> performance');

to_tsquery requires valid tsquery syntax. If the user types bare text, it will error.

`plainto_tsquery`

Takes plain text and converts it to an AND query. Safe for direct user input.

sql
SELECT plainto_tsquery('english', 'postgresql performance tuning');
-- Result: 'postgresql' & 'perform' & 'tune'

`websearch_to_tsquery`

Understands Google-style search syntax. The best choice for search boxes.

sql
SELECT websearch_to_tsquery('english', 'postgresql "full text" -mysql');
-- postgres & 'full' <-> 'text' & !'mysql'

-- Quoted phrases, minus sign for exclusion — familiar to users

websearch_to_tsquery is the right function for production search boxes. It handles messy user input gracefully and supports the syntax users expect.

Searching a Table

Given a table of articles:

sql
CREATE TABLE articles (
  id         BIGSERIAL PRIMARY KEY,
  title      TEXT NOT NULL,
  body       TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);

A basic full-text search:

sql
SELECT id, title
FROM articles
WHERE to_tsvector('english', title || ' ' || body)
   @@ websearch_to_tsquery('english', 'postgresql performance');

This works but is slow — it calls to_tsvector on every row for every query. No index is used.

GIN Indexes: Making Search Fast

The solution is to pre-compute the tsvector and index it. The index type for tsvector is GIN (Generalized Inverted Index). A GIN index maps each lexeme to the set of rows that contain it — exactly like the index at the back of a book.

Option 1: Index on an Expression

sql
CREATE INDEX idx_articles_fts
ON articles
USING GIN (to_tsvector('english', title || ' ' || body));

The query must use the exact same expression for the index to be used:

sql
-- This uses the index:
SELECT id, title
FROM articles
WHERE to_tsvector('english', title || ' ' || body)
   @@ websearch_to_tsquery('english', 'postgresql performance');

Expression indexes work but have a maintenance overhead: the expression is recomputed on every INSERT and UPDATE.

Option 2: Generated Stored Column (Recommended)

A better pattern in PostgreSQL 12+: store the tsvector as a generated column and index that.

sql
ALTER TABLE articles
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))
  ) STORED;

CREATE INDEX idx_articles_search_vector
ON articles
USING GIN (search_vector);

Now queries are clean and simple:

sql
SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql performance');

The search_vector column is maintained by PostgreSQL automatically on INSERT and UPDATE. The GIN index is built on the stored column. Query performance is fast.

Weighting Multiple Columns

Different columns have different importance — a match in the title should rank higher than a match in the body. setweight assigns a weight label (A, B, C, D — in descending importance) to a tsvector.

sql
-- Title gets weight A (highest), body gets weight B
ALTER TABLE articles
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body,  '')), 'B')
  ) STORED;

Weights are used by the ranking functions to score results. A query matching in the title scores higher than the same query matching only in the body.

Ranking Results

Matching tells you which rows contain the search terms. Ranking tells you which rows are most relevant. PostgreSQL has two ranking functions:

`ts_rank`

sql
SELECT
  id,
  title,
  ts_rank(search_vector, query) AS rank
FROM articles,
     websearch_to_tsquery('english', 'postgresql performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

ts_rank computes relevance based on how frequently the query terms appear in the document. Higher frequency = higher rank.

`ts_rank_cd`

sql
SELECT
  id,
  title,
  ts_rank_cd(search_vector, query) AS rank
FROM articles,
     websearch_to_tsquery('english', 'postgresql performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

ts_rank_cd ("cover density") also considers how close together the matching terms appear in the document. Generally produces more intuitive rankings when terms appear in proximity to each other.

Combining Rank with Recency

Pure relevance ranking sometimes buries recent results. A common production pattern:

sql
SELECT
  id,
  title,
  created_at,
  ts_rank_cd(search_vector, query) * (
    1.0 / (1 + extract(epoch from now() - created_at) / 86400.0)
  ) AS combined_score
FROM articles,
     websearch_to_tsquery('english', 'postgresql performance') AS query
WHERE search_vector @@ query
ORDER BY combined_score DESC
LIMIT 20;

This boosts newer results, decaying exponentially with age. Adjust the decay constant to taste.

Highlighted Snippets with `ts_headline`

Search results usually show a snippet of the original text with matching terms highlighted. ts_headline does this:

sql
SELECT
  id,
  title,
  ts_headline(
    'english',
    body,
    websearch_to_tsquery('english', 'postgresql performance'),
    'MaxWords=30, MinWords=15, StartSel=<mark>, StopSel=</mark>'
  ) AS snippet
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql performance')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'postgresql performance')) DESC
LIMIT 10;

ts_headline options:

MaxWords / MinWords — snippet length
StartSel / StopSel — HTML tags wrapping matched terms
MaxFragments — return multiple non-contiguous snippets
FragmentDelimiter — separator between fragments (default " ... ")

Important: do not run ts_headline on the tsvector column — run it on the original text column. ts_headline is slow (it re-processes the document to find context). Compute it only on the result rows, never in a WHERE clause.

Multi-Column Search

For tables with many text fields, aggregate them all into one search_vector:

sql
CREATE TABLE products (
  id          BIGSERIAL PRIMARY KEY,
  name        TEXT NOT NULL,
  description TEXT,
  tags        TEXT,   -- comma-separated or space-separated tags
  category    TEXT
);

ALTER TABLE products
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name,        '')), 'A') ||
    setweight(to_tsvector('english', coalesce(tags,        '')), 'A') ||
    setweight(to_tsvector('english', coalesce(category,    '')), 'B') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'C')
  ) STORED;

CREATE INDEX idx_products_search ON products USING GIN (search_vector);

Querying is the same regardless of how many columns feed the vector:

sql
SELECT id, name, category
FROM products
WHERE search_vector @@ websearch_to_tsquery('english', 'ergonomic standing desk')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'ergonomic standing desk')) DESC
LIMIT 20;

Language Configuration

Every to_tsvector and to_tsquery call takes a language configuration name as the first argument. This controls which stop words are removed and which stemming rules apply.

sql
-- English stemming
SELECT to_tsvector('english', 'running runner runs');
-- 'run':1,2,3  (all three map to the same lexeme)

-- Spanish stemming
SELECT to_tsvector('spanish', 'corriendo corredor corre');

-- Simple: no stemming, no stop words, just lowercase
SELECT to_tsvector('simple', 'The quick brown Fox');
-- 'brown':3 'fox':4 'quick':2 'the':1

Use 'simple' when you need exact-word matching (product codes, usernames) or when working with languages not supported by built-in dictionaries.

To see available configurations:

sql
SELECT cfgname FROM pg_ts_config;

Combining Full-Text Search with Filters

Full-text search works naturally alongside SQL filters. The GIN index is used for the text filter; additional B-tree indexes can be used for other conditions.

sql
-- Full-text search scoped to a specific category and date range
SELECT id, title, published_at
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'index performance')
  AND category = 'database'
  AND published_at > now() - interval '1 year'
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'index performance')) DESC
LIMIT 20;

PostgreSQL's planner will use the GIN index for the text condition and may use a B-tree index on category or published_at depending on estimated selectivity.

Performance Considerations

GIN index build time: GIN indexes are expensive to build on large tables. For tables with millions of rows, build the index during a maintenance window and be aware that inserts and updates are slightly slower while the index is maintained.

ts_headline cost: It re-parses the document to find the best snippet context. Only call it on the final result set (after LIMIT), never in a subquery used for filtering.

ts_rank is computed per row: Sorting a million rows by rank is expensive. Always filter with WHERE search_vector @@ query first, then rank the surviving rows. The GIN index reduces the candidate set to a small fraction of the table before ranking.

websearch_to_tsquery is cheap: Parse the query once (ideally in application code) and pass it as a parameter. Don't reconstruct it in every subquery.

sql
-- Efficient: query computed once, used twice
WITH q AS (
  SELECT websearch_to_tsquery('english', $1) AS query
)
SELECT
  a.id,
  a.title,
  ts_rank_cd(a.search_vector, q.query) AS rank,
  ts_headline('english', a.body, q.query) AS snippet
FROM articles a, q
WHERE a.search_vector @@ q.query
ORDER BY rank DESC
LIMIT 10;

When PostgreSQL FTS Is Not Enough

PostgreSQL full-text search handles the majority of application search requirements. The cases where Elasticsearch or a dedicated search engine makes more sense:

Autocomplete / prefix search: PostgreSQL FTS handles whole words. For character-by-character autocomplete, consider pg_trgm (trigram extension) or a dedicated autocomplete system.
Fuzzy matching / typo tolerance: pg_trgm provides edit-distance-based similarity search.
Real-time, sub-millisecond search across billions of rows: At extreme scale, dedicated search clusters with sharding and caching start to win.
Complex relevance tuning: Elasticsearch's BM25 and field boosting configuration is more flexible than PostgreSQL's ranking functions for search-intensive products.

For everything else — blog search, document search, product search, internal tooling — PostgreSQL is the right tool.

Practical Exercise

Build a searchable article store from scratch:

sql
-- 1. Create the table
CREATE TABLE blog_posts (
  id         BIGSERIAL PRIMARY KEY,
  title      TEXT NOT NULL,
  body       TEXT NOT NULL,
  author     TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);

-- 2. Add the generated tsvector column
ALTER TABLE blog_posts
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title,  '')), 'A') ||
    setweight(to_tsvector('english', coalesce(author, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(body,   '')), 'C')
  ) STORED;

-- 3. Create the GIN index
CREATE INDEX idx_blog_posts_fts ON blog_posts USING GIN (search_vector);

-- 4. Insert some data
INSERT INTO blog_posts (title, body, author) VALUES
  ('Getting Started with PostgreSQL', 'PostgreSQL is a powerful open source relational database...', 'Alice Chen'),
  ('Index Internals and Performance', 'Understanding B-tree structure helps write faster queries...', 'Bob Kim'),
  ('Row-Level Security in Production', 'RLS policies enforce data isolation at the database layer...', 'Alice Chen'),
  ('Full-Text Search Without Elasticsearch', 'PostgreSQL tsvector and GIN indexes provide production-grade search...', 'Carol Wang');

-- 5. Search with ranking
SELECT
  id,
  title,
  author,
  ts_rank_cd(search_vector, q) AS rank
FROM blog_posts,
     websearch_to_tsquery('english', 'postgresql index performance') AS q
WHERE search_vector @@ q
ORDER BY rank DESC;

-- 6. Add highlighted snippets
SELECT
  title,
  ts_headline('english', body, q,
    'MaxWords=20, MinWords=10, StartSel=**, StopSel=**'
  ) AS snippet
FROM blog_posts,
     websearch_to_tsquery('english', 'postgresql index') AS q
WHERE search_vector @@ q;

-- 7. Verify index is used
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM blog_posts
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql');
-- Look for: Bitmap Index Scan on idx_blog_posts_fts

Summary

PostgreSQL's full-text search system provides production-grade text search without external dependencies:

tsvector stores a normalized, stemmed representation of document text. tsquery expresses search queries with boolean operators. The @@ operator matches them. A GIN index on the tsvector column makes search fast at scale. Generated stored columns keep the vector automatically updated with no application code. ts_rank_cd provides relevance ranking. ts_headline generates highlighted snippets.

For the majority of application search use cases — content search, product search, document discovery — this is all you need. Elasticsearch solves a different problem at a different scale.

Next up: P-8 — Performance Tuning for Application Engineers — EXPLAIN ANALYZE in depth, connection pooling with PgBouncer, key configuration parameters, and eliminating the N+1 query problem.

PreviousModule P-6: Authentication, Row-Level Security, and Access Control Next Module P-8: Performance Tuning for Application Engineers