This page is a quick refresher on database indexes. An index is any data structure that aims for faster access/retrieval of data.
The purpose of a SQL index is pretty much the same as in its distant relative – the book index – it allows you to get to the information quickly, but instead of navigating through the book, it indexes a SQL database.
If you create an index based on the primary key, the engine doesn’t have to do a full table scan when searching for a particular row.
How to create an index
-- T-SQL, PL/SQL
CREATE INDEX MyIndex ON MyTable (Column1);
Indexes to prevent a table scan
What is a table
A table is a collection of columns and rows. You could also define a table as a collection of tuples.
Get all records that have the last name of ‘Smith’?
The sseudo code would be:
results = []
for row in rows:
if row[2] == 'Smith':
results.append[row]
Time Complexity: O(N)
How can we make it faster?
Use Indexes!
Performance impact of indexes
Indexes don’t come for free.
Rebuilding indexes
What you gain in retrieval speed, you lose in insertion and deletion speed because every time you alter a table, the indexes must be updated accordingly.
If your table is frequently updated, it’s possible that having indexes will cause your database’s overall performance to suffer.
Space
There is also a space penalty, as the indexes take up space in memory or on disk. A single index is smaller than the table because it doesn’t contain all the data, only pointers to the data, but in general, the larger the table, the larger the index.
The reference to rows is stored and not the full rows. This often allows indexes to be stored in memory even for tables that are so large that they can only be stored on disk.
For the pedants, many databases use B+ trees rather than classic B-trees for general database indexes.
Type of indexes
Hash Indexes
Hash indexes are based on a hashtable. Take the same example from above, finding all people with the last name of ‘Smith.’
One solution would be to create a hash table. Most databases support them, but they’re generally not the default type due to their limitations.
Map<String, Row> index = new HashMap<String, Row>();
Pros:
- O(1) average access.
Cons:
- Ranges operations are not supported, e.g., “Find all people who are younger than 45.” You could hash every single number from 1 to 45 and check every hash, though. But that has a linear time performance at worst (O(N)) where N is the number of rows in your table. Instead of O(log N) for the trees.
- Predefine Size: Hash indexes work with pre-defined hash table size. Space wasted, rehashing, etc.
B-tree indexes
B-tree is a data structure most commonly used for database indexes.
Here is the B-Tree for an Age index, for example.
Ranges are supported as opposed to hash indexes.
SELECT * FROM MyTable WHERE Age <= 13
Public class Node {
public Int IndexValue;
public Pointer RowPointer;
}
Examples
{34, 0x875900}
34
is the age, and 0x875900
is a reference to the location of the data.
For example, using the tree above, to get the records for all people younger than 13 requires looking at only the left branch of the tree root.
Pros:
- Range supported.
- All operations are O(log N) in the worst case (O(N) for hashtable).
- B-tree indexes are typically designed so that each node takes up one disk block. This allows each node to be read in with a single disk operation.
Other indexes
R-tree indexes
Bitmap indexes