Back to blog
Mar 03, 2025
6 min read

Navigating PostgreSQL - The explain command and sequential scans

Learn how to use the explain command to understand how PostgreSQL executes your queries, and how to optimize your queries for better performance.
#PostgreSQL #Database #Tutorial #Explain #Sequential Scan
Share this article

Table of Contents

  1. The Power of EXPLAIN in PostgreSQL
    1. What is EXPLAIN?
    2. Understanding the EXPLAIN Output
    3. PostgreSQL’s Cost Model Explained
    4. Understanding Page Costs: Sequential vs Random Access
    5. How PostgreSQL Makes Planning Decisions
  2. Sequential Scan: The Fundamental Table Access Method
    1. When Sequential Scan is Used
    2. Advantages of Sequential Scan
    3. Disadvantages of Sequential Scan
    4. Optimizing Sequential Scans
  3. Summary
  4. Next Steps
  5. References

The Power of EXPLAIN in PostgreSQL

When optimizing database performance, it’s essential to understand how your database executes queries. PostgreSQL provides the powerful EXPLAIN command that reveals the execution plan for your queries, helping you identify potential bottlenecks and optimization opportunities.

What is EXPLAIN?

The EXPLAIN command shows how PostgreSQL will execute your query. It reveals:

  • How tables will be scanned (sequential scan, index scan, etc.)
  • What join methods will be used for multiple tables
  • Cost estimates and other performance metrics

Here’s how to use it:

EXPLAIN SELECT * FROM users;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on users  (cost=0.00..258.00 rows=5000 width=244)
                           ^      ^           ^         ^
                           |      |           |         |
                        startup total       output   average
                        cost    cost        rows     row size

Understanding the EXPLAIN Output

Let’s break down each component:

  • Startup cost (0.00): Estimated computational units needed before the first row can be returned
  • Total cost (258.00): Estimated computational units for the entire query execution
  • Rows (5000): Estimated number of rows the operation will return
  • Width (244): Estimated average size of each row in bytes

PostgreSQL’s Cost Model Explained

PostgreSQL uses a cost-based optimizer that evaluates different execution plans and selects the one with the lowest estimated cost.

The Unit of Cost

The baseline unit for cost is “the effort to fetch a single page sequentially.” This provides a reference point for all other operations:

-- Key cost parameters that influence planning decisions
seq_page_cost = 1.0                   -- Cost of sequential page fetch (baseline)
random_page_cost = 4.0                -- Cost of random page fetch  
cpu_tuple_cost = 0.01                 -- Cost of processing a single row
cpu_index_tuple_cost = 0.005          -- Cost of processing an index entry
cpu_operator_cost = 0.0025            -- Cost of processing an operator

-- You can adjust these parameters for your environment
SET random_page_cost = 2.0;           -- Common setting for SSD storage
SHOW random_page_cost;                -- Verify the change

Understanding Page Costs: Sequential vs Random Access

Think of PostgreSQL’s cost model like a librarian deciding the fastest way to find books:

Sequential Page Cost (seq_page_cost = 1.0)

Imagine walking along a bookshelf, checking each book in order. This is a sequential scan - it’s predictable and consistent. PostgreSQL assigns this a baseline cost of 1.0 unit.

Random Page Cost (random_page_cost = 4.0)

Now imagine having to jump around the library, consulting the catalog and fetching books from different sections. This random access takes more effort! By default, PostgreSQL considers random access to be 4 times more expensive than sequential access.

Note: Cost values are unitless and relative, with 1.0 representing the baseline cost of a sequential page read.

Why This Matters

These cost settings determine when PostgreSQL chooses indexes:

  • With HDDs (traditional hard drives): Random access is genuinely slower (heads must physically move)
  • With SSDs: Random access is almost as fast as sequential access

HDD vs SSD

SSD has no moving parts, so random access is almost as fast as sequential access.

For modern SSDs, many DBAs adjust the random_page_cost much lower (often to 1.1-2.0), which encourages PostgreSQL to use indexes more frequently, potentially improving performance.

How PostgreSQL Makes Planning Decisions

The query planner evaluates multiple potential execution plans by:

  1. Estimating the number of rows that would match the query conditions
  2. Calculating the cost of different access methods (sequential scans, index scans, etc.)
  3. Considering table statistics and data distribution
  4. Selecting the plan with the lowest total estimated cost

Sequential Scan: The Fundamental Table Access Method

A sequential scan reads the entire table from start to finish, examining every row.

Sequential Scan

EXPLAIN SELECT * FROM users;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on users  (cost=0.00..258.00 rows=5000 width=244)

When Sequential Scan is Used

PostgreSQL will choose a sequential scan when:

  1. Small Tables: When a table is small enough that reading the entire table is faster than using an index
  2. Large Result Sets: When a query is expected to return a significant portion of the table (typically >5-10%)
  3. No Suitable Indexes: When no indexes exist that could help with the query
  4. Full Table Operations: For operations that need to process every row (like COUNT(*) without conditions)

Advantages of Sequential Scan

  • Simplicity: No index lookup overhead
  • Predictable I/O Pattern: Reads data in physical order, which is efficient for disk I/O
  • Memory Efficiency: Can benefit from OS-level read-ahead caching
  • No Index Maintenance: Doesn’t require indexes to be maintained

Disadvantages of Sequential Scan

  • Inefficient for Large Tables: Reading millions of rows when you only need a few is wasteful
  • Slower for Selective Queries: When you need only a small subset of data, indexes usually perform better
  • Resource Intensive: Consumes more I/O, CPU, and memory when scanning large tables

Optimizing Sequential Scans

  • Increase effective_cache_size: Helps PostgreSQL estimate available system memory for caching
  • Adjust work_mem: Provides more memory for sorting operations during scans
  • Consider table partitioning: Reduces the amount of data scanned for queries that can target specific partitions
  • Use parallel sequential scans: Set max_parallel_workers_per_gather to utilize multiple CPU cores

Summary

The EXPLAIN command is your window into PostgreSQL’s decision-making process. By understanding how the query planner evaluates costs and chooses execution plans, you can make informed decisions about:

  • When to create indexes
  • How to structure your queries
  • How to configure PostgreSQL for your hardware
  • When sequential scans are appropriate and when they should be avoided

Next Steps

  • Learn more about how data is stored on disk
  • Explore index scans and when they’re preferred over sequential scans
  • Discover other types of scans (bitmap scans, index-only scans)
  • Master techniques to optimize your queries for better performance

References