tsvector, tsquery, GIN indexes, relevance ranking, and generated tsvector columns — search without Elasticsearch.
P-7 — Full-Text Search
The standard answer to "we need search" is Elasticsearch. Spin up a cluster, sync your data, write query DSL, manage index mappings. For large-scale dedicated search, that trade-off sometimes makes sense. For most application search — finding documents, filtering content, search-as-you-type boxes — you already have everything you need inside PostgreSQL.
This module covers PostgreSQL's full-text search system: how it represents and indexes text, how queries work, how to rank results by relevance, and how to set it up so searches stay fast at scale.
The Problem With LIKE
The first instinct for search is LIKE:
sqlSELECT * FROM articles WHERE body LIKE '%postgresql performance%';
This works. It also requires a full sequential scan of the table for every search — no index can help with a leading wildcard. At a thousand rows it is fast. At a million rows it is a problem.
ILIKE (case-insensitive) is slower still. And neither handles linguistic variations: a search for "running" will not find articles containing "runs" or "ran."
Full-text search solves both problems: it is indexable and it understands language.
The Two Core Types
PostgreSQL full-text search is built around two data types:
tsvector — a preprocessed representation of a document. It is a sorted list of lexemes (normalized word stems) with their positions in the original text. When you convert text to tsvector, PostgreSQL normalizes words (removing stop words like "the", stemming "running" → "run"), and records where each term appeared.
tsquery — a search query. It is a boolean expression of lexemes that gets matched against a tsvector.
sql-- Convert text to tsvector SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog'); -- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2 -- Note: "The", "over" removed (stop words), "jumps"→"jump", "lazy"→"lazi" -- Convert a search query to tsquery SELECT to_tsquery('english', 'jump & fox'); -- Result: 'jump' & 'fox' -- Match a tsvector against a tsquery with the @@ operator SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog') @@ to_tsquery('english', 'jump & fox'); -- Result: true
The @@ operator is the match operator. It returns true if the tsvector satisfies the tsquery.
Building Queries with tsquery
to_tsquery
The most explicit form. Uses boolean operators:
sql-- AND: both terms must be present SELECT to_tsquery('english', 'postgres & performance'); -- OR: either term SELECT to_tsquery('english', 'postgres | mysql'); -- NOT: exclude a term SELECT to_tsquery('english', 'postgres & !mysql'); -- Phrase search (terms must be adjacent, in order) SELECT to_tsquery('english', 'postgres <-> performance'); -- Proximity (within N words) SELECT to_tsquery('english', 'postgres <2> performance');
to_tsquery requires valid tsquery syntax. If the user types bare text, it will error.
plainto_tsquery
Takes plain text and converts it to an AND query. Safe for direct user input.
sqlSELECT plainto_tsquery('english', 'postgresql performance tuning'); -- Result: 'postgresql' & 'perform' & 'tune'
websearch_to_tsquery
Understands Google-style search syntax. The best choice for search boxes.
sqlSELECT websearch_to_tsquery('english', 'postgresql "full text" -mysql'); -- postgres & 'full' <-> 'text' & !'mysql' -- Quoted phrases, minus sign for exclusion — familiar to users
websearch_to_tsquery is the right function for production search boxes. It handles messy user input gracefully and supports the syntax users expect.
Searching a Table
Given a table of articles:
sqlCREATE TABLE articles ( id BIGSERIAL PRIMARY KEY, title TEXT NOT NULL, body TEXT NOT NULL, created_at TIMESTAMPTZ DEFAULT now() );
A basic full-text search:
sqlSELECT id, title FROM articles WHERE to_tsvector('english', title || ' ' || body) @@ websearch_to_tsquery('english', 'postgresql performance');
This works but is slow — it calls to_tsvector on every row for every query. No index is used.
GIN Indexes: Making Search Fast
The solution is to pre-compute the tsvector and index it. The index type for tsvector is GIN (Generalized Inverted Index). A GIN index maps each lexeme to the set of rows that contain it — exactly like the index at the back of a book.
Option 1: Index on an Expression
sqlCREATE INDEX idx_articles_fts ON articles USING GIN (to_tsvector('english', title || ' ' || body));
The query must use the exact same expression for the index to be used:
sql-- This uses the index: SELECT id, title FROM articles WHERE to_tsvector('english', title || ' ' || body) @@ websearch_to_tsquery('english', 'postgresql performance');
Expression indexes work but have a maintenance overhead: the expression is recomputed on every INSERT and UPDATE.
Option 2: Generated Stored Column (Recommended)
A better pattern in PostgreSQL 12+: store the tsvector as a generated column and index that.
sqlALTER TABLE articles ADD COLUMN search_vector tsvector GENERATED ALWAYS AS ( to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, '')) ) STORED; CREATE INDEX idx_articles_search_vector ON articles USING GIN (search_vector);
Now queries are clean and simple:
sqlSELECT id, title FROM articles WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql performance');
The search_vector column is maintained by PostgreSQL automatically on INSERT and UPDATE. The GIN index is built on the stored column. Query performance is fast.
Weighting Multiple Columns
Different columns have different importance — a match in the title should rank higher than a match in the body. setweight assigns a weight label (A, B, C, D — in descending importance) to a tsvector.
sql-- Title gets weight A (highest), body gets weight B ALTER TABLE articles ADD COLUMN search_vector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(body, '')), 'B') ) STORED;
Weights are used by the ranking functions to score results. A query matching in the title scores higher than the same query matching only in the body.
Ranking Results
Matching tells you which rows contain the search terms. Ranking tells you which rows are most relevant. PostgreSQL has two ranking functions:
ts_rank
sqlSELECT id, title, ts_rank(search_vector, query) AS rank FROM articles, websearch_to_tsquery('english', 'postgresql performance') AS query WHERE search_vector @@ query ORDER BY rank DESC;
ts_rank computes relevance based on how frequently the query terms appear in the document. Higher frequency = higher rank.
ts_rank_cd
sqlSELECT id, title, ts_rank_cd(search_vector, query) AS rank FROM articles, websearch_to_tsquery('english', 'postgresql performance') AS query WHERE search_vector @@ query ORDER BY rank DESC;
ts_rank_cd ("cover density") also considers how close together the matching terms appear in the document. Generally produces more intuitive rankings when terms appear in proximity to each other.
Combining Rank with Recency
Pure relevance ranking sometimes buries recent results. A common production pattern:
sqlSELECT id, title, created_at, ts_rank_cd(search_vector, query) * ( 1.0 / (1 + extract(epoch from now() - created_at) / 86400.0) ) AS combined_score FROM articles, websearch_to_tsquery('english', 'postgresql performance') AS query WHERE search_vector @@ query ORDER BY combined_score DESC LIMIT 20;
This boosts newer results, decaying exponentially with age. Adjust the decay constant to taste.
Highlighted Snippets with ts_headline
Search results usually show a snippet of the original text with matching terms highlighted. ts_headline does this:
sqlSELECT id, title, ts_headline( 'english', body, websearch_to_tsquery('english', 'postgresql performance'), 'MaxWords=30, MinWords=15, StartSel=<mark>, StopSel=</mark>' ) AS snippet FROM articles WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql performance') ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'postgresql performance')) DESC LIMIT 10;
ts_headline options:
MaxWords/MinWords— snippet lengthStartSel/StopSel— HTML tags wrapping matched termsMaxFragments— return multiple non-contiguous snippetsFragmentDelimiter— separator between fragments (default" ... ")
Important: do not run ts_headline on the tsvector column — run it on the original text column. ts_headline is slow (it re-processes the document to find context). Compute it only on the result rows, never in a WHERE clause.
Multi-Column Search
For tables with many text fields, aggregate them all into one search_vector:
sqlCREATE TABLE products ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL, description TEXT, tags TEXT, -- comma-separated or space-separated tags category TEXT ); ALTER TABLE products ADD COLUMN search_vector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(name, '')), 'A') || setweight(to_tsvector('english', coalesce(tags, '')), 'A') || setweight(to_tsvector('english', coalesce(category, '')), 'B') || setweight(to_tsvector('english', coalesce(description, '')), 'C') ) STORED; CREATE INDEX idx_products_search ON products USING GIN (search_vector);
Querying is the same regardless of how many columns feed the vector:
sqlSELECT id, name, category FROM products WHERE search_vector @@ websearch_to_tsquery('english', 'ergonomic standing desk') ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'ergonomic standing desk')) DESC LIMIT 20;
Language Configuration
Every to_tsvector and to_tsquery call takes a language configuration name as the first argument. This controls which stop words are removed and which stemming rules apply.
sql-- English stemming SELECT to_tsvector('english', 'running runner runs'); -- 'run':1,2,3 (all three map to the same lexeme) -- Spanish stemming SELECT to_tsvector('spanish', 'corriendo corredor corre'); -- Simple: no stemming, no stop words, just lowercase SELECT to_tsvector('simple', 'The quick brown Fox'); -- 'brown':3 'fox':4 'quick':2 'the':1
Use 'simple' when you need exact-word matching (product codes, usernames) or when working with languages not supported by built-in dictionaries.
To see available configurations:
sqlSELECT cfgname FROM pg_ts_config;
Combining Full-Text Search with Filters
Full-text search works naturally alongside SQL filters. The GIN index is used for the text filter; additional B-tree indexes can be used for other conditions.
sql-- Full-text search scoped to a specific category and date range SELECT id, title, published_at FROM articles WHERE search_vector @@ websearch_to_tsquery('english', 'index performance') AND category = 'database' AND published_at > now() - interval '1 year' ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'index performance')) DESC LIMIT 20;
PostgreSQL's planner will use the GIN index for the text condition and may use a B-tree index on category or published_at depending on estimated selectivity.
Performance Considerations
GIN index build time: GIN indexes are expensive to build on large tables. For tables with millions of rows, build the index during a maintenance window and be aware that inserts and updates are slightly slower while the index is maintained.
ts_headline cost: It re-parses the document to find the best snippet context. Only call it on the final result set (after LIMIT), never in a subquery used for filtering.
ts_rank is computed per row: Sorting a million rows by rank is expensive. Always filter with WHERE search_vector @@ query first, then rank the surviving rows. The GIN index reduces the candidate set to a small fraction of the table before ranking.
websearch_to_tsquery is cheap: Parse the query once (ideally in application code) and pass it as a parameter. Don't reconstruct it in every subquery.
sql-- Efficient: query computed once, used twice WITH q AS ( SELECT websearch_to_tsquery('english', $1) AS query ) SELECT a.id, a.title, ts_rank_cd(a.search_vector, q.query) AS rank, ts_headline('english', a.body, q.query) AS snippet FROM articles a, q WHERE a.search_vector @@ q.query ORDER BY rank DESC LIMIT 10;
When PostgreSQL FTS Is Not Enough
PostgreSQL full-text search handles the majority of application search requirements. The cases where Elasticsearch or a dedicated search engine makes more sense:
- Autocomplete / prefix search: PostgreSQL FTS handles whole words. For character-by-character autocomplete, consider
pg_trgm(trigram extension) or a dedicated autocomplete system. - Fuzzy matching / typo tolerance:
pg_trgmprovides edit-distance-based similarity search. - Real-time, sub-millisecond search across billions of rows: At extreme scale, dedicated search clusters with sharding and caching start to win.
- Complex relevance tuning: Elasticsearch's BM25 and field boosting configuration is more flexible than PostgreSQL's ranking functions for search-intensive products.
For everything else — blog search, document search, product search, internal tooling — PostgreSQL is the right tool.
Practical Exercise
Build a searchable article store from scratch:
sql-- 1. Create the table CREATE TABLE blog_posts ( id BIGSERIAL PRIMARY KEY, title TEXT NOT NULL, body TEXT NOT NULL, author TEXT NOT NULL, created_at TIMESTAMPTZ DEFAULT now() ); -- 2. Add the generated tsvector column ALTER TABLE blog_posts ADD COLUMN search_vector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(author, '')), 'B') || setweight(to_tsvector('english', coalesce(body, '')), 'C') ) STORED; -- 3. Create the GIN index CREATE INDEX idx_blog_posts_fts ON blog_posts USING GIN (search_vector); -- 4. Insert some data INSERT INTO blog_posts (title, body, author) VALUES ('Getting Started with PostgreSQL', 'PostgreSQL is a powerful open source relational database...', 'Alice Chen'), ('Index Internals and Performance', 'Understanding B-tree structure helps write faster queries...', 'Bob Kim'), ('Row-Level Security in Production', 'RLS policies enforce data isolation at the database layer...', 'Alice Chen'), ('Full-Text Search Without Elasticsearch', 'PostgreSQL tsvector and GIN indexes provide production-grade search...', 'Carol Wang'); -- 5. Search with ranking SELECT id, title, author, ts_rank_cd(search_vector, q) AS rank FROM blog_posts, websearch_to_tsquery('english', 'postgresql index performance') AS q WHERE search_vector @@ q ORDER BY rank DESC; -- 6. Add highlighted snippets SELECT title, ts_headline('english', body, q, 'MaxWords=20, MinWords=10, StartSel=**, StopSel=**' ) AS snippet FROM blog_posts, websearch_to_tsquery('english', 'postgresql index') AS q WHERE search_vector @@ q; -- 7. Verify index is used EXPLAIN (ANALYZE, BUFFERS) SELECT id FROM blog_posts WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql'); -- Look for: Bitmap Index Scan on idx_blog_posts_fts
Summary
PostgreSQL's full-text search system provides production-grade text search without external dependencies:
tsvector stores a normalized, stemmed representation of document text. tsquery expresses search queries with boolean operators. The @@ operator matches them. A GIN index on the tsvector column makes search fast at scale. Generated stored columns keep the vector automatically updated with no application code. ts_rank_cd provides relevance ranking. ts_headline generates highlighted snippets.
For the majority of application search use cases — content search, product search, document discovery — this is all you need. Elasticsearch solves a different problem at a different scale.
Next up: P-8 — Performance Tuning for Application Engineers — EXPLAIN ANALYZE in depth, connection pooling with PgBouncer, key configuration parameters, and eliminating the N+1 query problem.