Vooteam | Indexing in Postgres

Indexing introduction

Indexing is a technique used to speed up database queries by creating a structured reference to where data is stored.

Without an index, the database must scan every row in a table to find the required information. In large tables, this means the search can take a long time—especially if the data is located near the end.

An index works like a shortcut, allowing the database to quickly locate the exact position of the data instead of checking each row one by one, resulting in much faster query performance.

Visualization for finding the last entry:

If the table was ordered alphabetically, searching for a name could happen a lot faster because we could skip looking for the data in certain rows. If we wanted to search for “Zack” and we know the data is in alphabetical order we could jump down to halfway through the data to see if Zack comes before or after that row. We could then half the remaining rows and make the same comparison.

This took 3 comparisons to find the right answer instead of 8 in the unindexed data. Indexes allow us to create sorted lists without having to create all new sorted tables, which would take up a lot of storage space.

What Exactly Is an Index?

An index is a separate data structure that stores the values of a specific column along with references (pointers) to the rows where the actual data lives in the main table.

Instead of searching through the table row by row, the database uses the index to quickly locate the matching entries and then jumps directly to their positions in the original table.

A simple way to think about an index is a contact list. Even if contacts are stored in the order they were added, it’s much faster to find a person when the list is organized alphabetically. In the same way, a database index organizes data for fast lookup while still pointing back to the original table where the full records are stored.

Let’s look at the index from the previous example and see how it maps back to the original Friends table:

We can see here that the table has the data stored ordered by an incrementing id based on the order in which the data was added. And the Index has the names stored in alphabetical order.

Types of Indexing

There are two types of databases indexes:

Clustered
Non-clustered

Both clustered and non-clustered indexes are implemented using B-tree data structures. A B-tree is a self-balancing tree that keeps data sorted and enables efficient searching, sequential access, insertions, and deletions—all in logarithmic time.

In simple terms, a B-tree organizes data in a hierarchical, tree-like structure, allowing the database to quickly locate and retrieve information without scanning every row in a table.

Understanding the B-tree Structure The diagram above shows the B-tree created for our index. The smallest value appears on the far left of the tree, while the largest value is on the far right.

Every query begins at the root (top) node and moves downward through the tree. At each step, the database compares the target value with the current node: if the value is smaller, it follows the left branch; if it is larger, it follows the right branch.

In our example, the search is first compared against Matt, then Todd, and finally Zack before reaching the correct result.

To improve performance, B-trees often limit the number of characters stored in each index entry.
This optimization is handled internally by the B-tree and does not require any restrictions on the actual column data. In the example shown, the B-tree limits index entries to four characters.

Clustered Indexes

A clustered index is the unique index in a table that determines how the table’s data is physically organized. It uses the primary key to sort and store rows in increasing order, which is the same order the data is maintained in memory.

In PostgreSQL, a clustered index must be explicitly defined. It is typically created alongside the table and is based on the primary key, which is sorted in ascending order by default.

Because the table’s data follows the order of the clustered index, accessing rows by the primary key is highly efficient.

Creating Clustered Indexes

The clustered index will be automatically created when the primary key is defined:

_{CREATE TABLE friends (id INT PRIMARY KEY, name VARCHAR, city VARCHAR);}

Once filled in, that table would look something like this:

_{The created table, “friends” , will have a clustered index automatically created, organized around the Primary Key “id” called “friends_pkey”:}

When querying the table by id, the ascending order of the column enables highly efficient searches. Because the values are sorted, the database can traverse the B-tree structure and locate records in logarithmic time.

However, if we search by name or city, the database must scan every row in the table since these columns do not have an index. This full table scan is far less efficient, especially as the table grows.

This is where non-clustered indexes become especially valuable, as they allow fast lookups on columns that are not part of the table’s physical ordering.

Non-Clustered Indexes

Non-clustered indexes are separate, sorted structures built on a specific column from the main table. Instead of storing the actual table data, they contain references (pointers) that link back to the corresponding rows in the original table.

The first example we discussed is a classic illustration of a non-clustered index, where the index is maintained independently of the table’s physical data order.

Non-clustered indexes are used to improve query performance by creating searchable structures on specific columns. Unlike clustered indexes, they do not change the physical order of the table’s data.

These indexes can be added by developers or data analysts after a table has already been created and populated with data.

A non-clustered index stores pointers to the memory locations of rows rather than the actual data itself. Because of this extra lookup step, non-clustered indexes are generally slower than clustered indexes, but they are still significantly faster than querying a column with no index at all.

Multiple non-clustered indexes can be created on a single table. For example, SQL Server supports up to 999 non-clustered indexes per table, while PostgreSQL does not impose a strict limit.

Creating Non-Clustered Databases(PostgreSQL)

Partial Indexes:

A partial index indexes only a subset of rows in a table by applying a WHERE clause. The main goal is to improve efficiency by reducing the size of the index. Smaller indexes consume less storage, are cheaper to maintain, and are faster to scan.

For example, imagine users can flag comments on your site, setting a flagged boolean column to true. If you frequently process only flagged comments, you can create a partial index like this:

_{CREATE INDEX articles_flagged_created_at_index}

_{ON articles (created_at) WHERE flagged IS TRUE;}

This index includes only rows where flagged is true, keeping it compact and efficient. It can also be combined with other indexes when executing more complex queries.

Expression Indexes:

Expression indexes are useful when queries filter on computed values rather than raw column data. PostgreSQL allows you to index the result of an expression or function, enabling the database to use an index even when a transformation is applied in the query.

A common example is case-insensitive email lookups. You might store email addresses as entered by users, but perform authentication using a lowercase comparison:

_{WHERE lower(email) = '<lowercased-email>'}

To allow this query to use an index, you can create an expression index:

_{CREATE INDEX users_lower_email_index ON users (lower(email));}

Another frequent use case is querying by date when timestamps are stored with time information. For example, if published_at is a timestamp but queries filter by date, you can create an index like:

_{CREATE INDEX articles_day_index ON articles (date(published_at));}

This index can then be used by queries such as:

_{WHERE date(articles.published_at) = date('2011-03-07');}

Unique Indexes:

A unique index ensures that no more than one row in a table can have the same value for a given column (or set of columns). Creating unique indexes is beneficial for two main reasons: data integrity and performance. Queries that use unique indexes are typically very fast because the database knows there can be only one matching result.

From a data integrity perspective, relying only on application-level validations—such as a validates_uniqueness_ofcheck in an ActiveModel class—is not sufficient. In systems with concurrent users, multiple requests can bypass these checks and create duplicate records. For this reason, uniqueness should always be enforced at the database level, either through a unique index or a unique constraint.

In practice, there is little difference between unique indexes and unique constraints. However, unique indexes operate at a lower level and are more flexible. For example, PostgreSQL allows expression indexes and partial indexes to be defined as unique, which is not possible with standard unique constraints. This means you can even create partial unique indexes on expressions when needed

Searching Indexes:

Once non-clustered indexes are created, they can be used to efficiently execute queries. Indexes rely on an optimized search technique known as binary search.

Binary search works by repeatedly dividing the data set in half and determining whether the target value lies before or after the midpoint. This approach is highly efficient and runs in logarithmic time.

B-trees are especially well suited for binary search. They are structured to start comparisons at middle values, with smaller values organized along the left branches and larger values along the right branches. As a result, the database can quickly navigate the tree to locate the desired entry instead of scanning the entire table.

In contrast, without an index, the database would need to examine each row one by one, which becomes increasingly expensive as the table grows.

When to use Indexing?

Indexes are designed to improve database performance, so they should be used when they provide a meaningful speedup for your queries. As a database grows in size, the benefits of indexing become more noticeable, especially for frequently queried columns and large tables.

That said, indexing should be applied thoughtfully. While indexes speed up read operations, they also add overhead to write operations such as inserts, updates, and deletes.

Testing Index Performance

To determine whether an index improves query performance, you can benchmark your queries before and after creating the index. Start by running a set of queries and recording how long they take to complete. Then, add the appropriate indexes and rerun the same queries to compare the results.

In PostgreSQL, this can be done using the EXPLAIN ANALYZE command, which shows both the query plan and the actual execution time:

_{EXPLAIN ANALYZE SELECT * FROM friends WHERE name = 'Blake';}

This allows you to see whether the database is using an index and how much the execution time has improved.

How Does Search Improve After Implementing Indexing?

Without an index, the database performs a linear scan, checking every row until it finds a match. With indexing in place, the search process changes to a binary search–like approach using a B-tree structure.

This reduces the number of comparisons from N to log(N), significantly lowering the amount of work the database must perform. As a result, CPU usage decreases and queries complete much faster.

Additionally, instead of sequentially scanning many connected memory blocks, the database only needs to access a small number of relevant blocks, further improving efficiency and reducing overall compute cost.

Conclusion: Scaling with Precision

Indexing is the secret to keeping PostgreSQL fast as your data grows. By replacing exhaustive table scans with optimized B-tree structures, you transform slow queries into efficient, logarithmic searches.

The key to success is balance. While indexes significantly speed up reads, they add overhead to every write operation. To maintain peak performance:

Be Selective: Index only the columns used frequently in WHERE clauses.
Use Advanced Types: Leverage partial or expression indexes for specific, high-traffic queries.
Verify Results: Always use EXPLAIN ANALYZE to confirm your index is being used effectively.

By applying these strategies, you ensure your database remains both smart and responsive, no matter how much data you scale.

Thanks

Indexing in Postgres