Navigating PostgreSQL - The explain command and sequential scans

The Power of EXPLAIN in PostgreSQL
Sequential Scan: The Fundamental Table Access Method
Summary
Next Steps
References

The Power of EXPLAIN in PostgreSQL

When optimizing database performance, it’s essential to understand how your database executes queries. PostgreSQL provides the powerful EXPLAIN command that reveals the execution plan for your queries, helping you identify potential bottlenecks and optimization opportunities.

What is EXPLAIN?

The EXPLAIN command shows how PostgreSQL will execute your query. It reveals:

How tables will be scanned (sequential scan, index scan, etc.)
What join methods will be used for multiple tables
Cost estimates and other performance metrics

Here’s how to use it:

EXPLAIN SELECT * FROM users;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on users  (cost=0.00..258.00 rows=5000 width=244)
                           ^      ^           ^         ^
                           |      |           |         |
                        startup total       output   average
                        cost    cost        rows     row size

Understanding the EXPLAIN Output

Let’s break down each component:

Startup cost (0.00): Estimated computational units needed before the first row can be returned
Total cost (258.00): Estimated computational units for the entire query execution
Rows (5000): Estimated number of rows the operation will return
Width (244): Estimated average size of each row in bytes

PostgreSQL’s Cost Model Explained

PostgreSQL uses a cost-based optimizer that evaluates different execution plans and selects the one with the lowest estimated cost.

The Unit of Cost

The baseline unit for cost is “the effort to fetch a single page sequentially.” This provides a reference point for all other operations:

-- Key cost parameters that influence planning decisions
seq_page_cost = 1.0                   -- Cost of sequential page fetch (baseline)
random_page_cost = 4.0                -- Cost of random page fetch
cpu_tuple_cost = 0.01                 -- Cost of processing a single row
cpu_index_tuple_cost = 0.005          -- Cost of processing an index entry
cpu_operator_cost = 0.0025            -- Cost of processing an operator

-- You can adjust these parameters for your environment
SET random_page_cost = 2.0;           -- Common setting for SSD storage
SHOW random_page_cost;                -- Verify the change

Understanding Page Costs: Sequential vs Random Access

Think of PostgreSQL’s cost model like a librarian deciding the fastest way to find books:

Sequential Page Cost (seq_page_cost = 1.0)

Imagine walking along a bookshelf, checking each book in order. This is a sequential scan - it’s predictable and consistent. PostgreSQL assigns this a baseline cost of 1.0 unit.

Random Page Cost (random_page_cost = 4.0)

Now imagine having to jump around the library, consulting the catalog and fetching books from different sections. This random access takes more effort! By default, PostgreSQL considers random access to be 4 times more expensive than sequential access.

Note: Cost values are unitless and relative, with 1.0 representing the baseline cost of a sequential page read.

Why This Matters

These cost settings determine when PostgreSQL chooses indexes:

With HDDs (traditional hard drives): Random access is genuinely slower (heads must physically move)
With SSDs: Random access is almost as fast as sequential access

HDD vs SSD

SSD has no moving parts, so random access is almost as fast as sequential access.

For modern SSDs, many DBAs adjust the random_page_cost much lower (often to 1.1-2.0), which encourages PostgreSQL to use indexes more frequently, potentially improving performance.

How PostgreSQL Makes Planning Decisions

The query planner evaluates multiple potential execution plans by:

Estimating the number of rows that would match the query conditions
Calculating the cost of different access methods (sequential scans, index scans, etc.)
Considering table statistics and data distribution
Selecting the plan with the lowest total estimated cost

Sequential Scan: The Fundamental Table Access Method

A sequential scan reads the entire table from start to finish, examining every row.

Sequential Scan

EXPLAIN SELECT * FROM users;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on users  (cost=0.00..258.00 rows=5000 width=244)

When Sequential Scan is Used

PostgreSQL will choose a sequential scan when:

Small Tables: When a table is small enough that reading the entire table is faster than using an index
Large Result Sets: When a query is expected to return a significant portion of the table (typically >5-10%)
No Suitable Indexes: When no indexes exist that could help with the query
Full Table Operations: For operations that need to process every row (like COUNT(*) without conditions)

Advantages of Sequential Scan

Simplicity: No index lookup overhead
Predictable I/O Pattern: Reads data in physical order, which is efficient for disk I/O
Memory Efficiency: Can benefit from OS-level read-ahead caching
No Index Maintenance: Doesn’t require indexes to be maintained

Disadvantages of Sequential Scan

Inefficient for Large Tables: Reading millions of rows when you only need a few is wasteful
Slower for Selective Queries: When you need only a small subset of data, indexes usually perform better
Resource Intensive: Consumes more I/O, CPU, and memory when scanning large tables

Optimizing Sequential Scans

Increase effective_cache_size: Helps PostgreSQL estimate available system memory for caching
Adjust work_mem: Provides more memory for sorting operations during scans
Consider table partitioning: Reduces the amount of data scanned for queries that can target specific partitions
Use parallel sequential scans: Set max_parallel_workers_per_gather to utilize multiple CPU cores

Summary

The EXPLAIN command is your window into PostgreSQL’s decision-making process. By understanding how the query planner evaluates costs and chooses execution plans, you can make informed decisions about:

When to create indexes
How to structure your queries
How to configure PostgreSQL for your hardware
When sequential scans are appropriate and when they should be avoided

Next Steps

Learn more about how data is stored on disk
Explore index scans and when they’re preferred over sequential scans
Discover other types of scans (bitmap scans, index-only scans)
Master techniques to optimize your queries for better performance