The pg_partman extension
Manage large Postgres tables using the PostgreSQL Partition Manager extension
pg_partman
is a Postgres extension that simplifies the management of partitioned tables. Partitioning refers to splitting a single table into smaller pieces called partitions
. This is done based on the values in a key column or set of columns. Even though partitions are stored as separate physical tables, the partitioned table can still be queried as a single logical table. This can significantly enhance query performance and also help you manage the data lifecycle of tables that grow very large.
While Postgres natively supports partitioning, pg_partman
helps set up and manage partitioned tables by automating steps like creating new partitions and handling the data lifecycle for a given retention policy.
In this guide, we’ll learn how to set up and use the pg_partman
extension with your Neon Postgres project. We'll cover why partitioning is helpful, how to enable pg_partman
, creating partitioned tables, and automating partition maintenance.
note
pg_partman
is an open-source Postgres extension that can be installed in any Neon project using the instructions below. Detailed installation instructions and compatibility information can be found in the pg_partman documentation.
pg_partman
extension
Enable the You can enable the extension by running the following CREATE EXTENSION
statement in the Neon SQL Editor or from a client such as psql
that is connected to Neon.
For information about using the Neon SQL Editor, see Query with Neon's SQL Editor. For information about using the psql
client with Neon, see Connect with psql.
Version Compatibility:
pg_partman
works with Postgres 14 and above, complementing the native partitioning features introduced in these versions.
Why partition your data?
For tables that grow very large, partitioning offers several benefits:
- Faster queries: Partitioning allows Postgres to quickly locate and retrieve data within a specific partition, rather than scanning the entire table.
- Scalability: Partitioning makes database administration simpler. For example, smaller partitions are easier to load and delete or back up and recover.
- Managing the data lifecycle: Easier management of the data lifecycle by archiving or purging old partitions, which can be moved to cheaper storage options without affecting the active dataset.
Native partitioning vs pg_partman
Postgres supports partitioning tables natively, with the following strategies to divide the data:
- List partitioning: Data is distributed across partitions based on a list of values, such as a category or location.
- Range partitioning: Data is distributed across partitions based on ranges of values, such as dates or numerical ranges.
With native partitioning, you need to manually create and manage partitions for your table.
pg_partman
only supports creating partitions that are number or time-based, with each partition covering a range of values. However, this simplifies the process of creating and managing partitions.
Example: Partitioning user-activity data
Consider a social media platform that tracks user interactions in their website application, such as likes, comments, and shares. The data is stored in a table called user_activities
, where activity_type
stores the type of activity and the other columns store additional information about the activity.
Setting up a partitioned table
Given the large volume of data generated by user interactions, partitioning the user_activities
table can help keep queries manageable. Recent activity data is typically the most interesting for both the platform and its users, so activity_time
is a good candidate to partition on.
To create a partition for each week of activity data, you can run the following query:
This will create a new partition for each week of data in the user_activities
table. We can insert some sample data into the table:
Querying partitioned tables
We can query against the user_activities
table as if it were a single table, and Postgres will automatically route the query to the correct partition(s) based on the activity_time
column.
This query returns the following results:
To see the list of all partitions created for the user_activities
table, you can run the following query:
This will return the following results:
pg_partman
automatically created tables for weekly intervals close to the current data. As more data is inserted, it will create new partitions. Additionally, there is a user_activities_default
table that stores data that doesn't fit into any of the existing partitions.
Data retention policies
To make sure that old data is automatically removed from the main table, you can set up a retention policy:
The background worker process that comes bundled with pg_partman
automatically detaches the old partitions that are older than 4 weeks from the main table. Since we've set retention_keep_table
to true
, the old partitions are kept as separate tables, and not dropped from the database.
Uniqueness constraints
Postgres doesn't support indexes or unique constraints that span multiple tables. Since a partitioned table is made up of multiple physical tables, you can't create a unique constraint that spans all the partitions. For example, the following query will fail:
It returns the following error:
However, when the unique constraint involves partition key columns, Postgres can guarantee uniqueness across all partitions. In this way, different partitions cannot share the same values for the partition key columns, which allows unique constraints to be enforced.
For example, including the activity_time
column in the unique constraint will work because activity_time
is a partition key column:
Conclusion
By leveraging pg_partman
, you can significantly enhance the native partitioning functionality of Postgres, particularly for large-scale and time-series datasets. The extension simplifies partition management, automates retention and archival tasks, and improves query performance.
Reference
Last updated on