The pg_partman extension
Manage large Postgres tables using the PostgreSQL Partition Manager extension
pg_partman
is a Postgres extension that simplifies the management of partitioned tables. Partitioning refers to splitting a single table into smaller pieces called partitions
. This is done based on the values in a key column or set of columns. Even though partitions are stored as separate physical tables, the partitioned table can still be queried as a single logical table. This can significantly enhance query performance and also help you manage the data lifecycle of tables that grow very large.
While Postgres natively supports partitioning, pg_partman
helps set up and manage partitioned tables by automating steps like creating new partitions and handling the data lifecycle for a given retention policy.
Try it on Neon!
Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.
In this guide, we’ll learn how to set up and use the pg_partman
extension with your Neon Postgres project. We'll cover why partitioning is helpful, how to enable pg_partman
, creating partitioned tables, and automating partition maintenance.
note
pg_partman
is an open-source Postgres extension that can be installed in any Neon project using the instructions below. Detailed installation instructions and compatibility information can be found in the pg_partman documentation.
pg_partman
extension
Enable the You can enable the extension by running the following CREATE EXTENSION
statement in the Neon SQL Editor or from a client such as psql
that is connected to Neon.
CREATE EXTENSION IF NOT EXISTS pg_partman;
For information about using the Neon SQL Editor, see Query with Neon's SQL Editor. For information about using the psql
client with Neon, see Connect with psql.
Version Compatibility:
pg_partman
works with Postgres 14 and above, complementing the native partitioning features introduced in these versions.
Why partition your data?
For tables that grow very large, partitioning offers several benefits:
- Faster queries: Partitioning allows Postgres to quickly locate and retrieve data within a specific partition, rather than scanning the entire table.
- Scalability: Partitioning makes database administration simpler. For example, smaller partitions are easier to load and delete or back up and recover.
- Managing the data lifecycle: Easier management of the data lifecycle by archiving or purging old partitions, which can be moved to cheaper storage options without affecting the active dataset.
Native partitioning vs pg_partman
Postgres supports partitioning tables natively, with the following strategies to divide the data:
- List partitioning: Data is distributed across partitions based on a list of values, such as a category or location.
- Range partitioning: Data is distributed across partitions based on ranges of values, such as dates or numerical ranges.
With native partitioning, you need to manually create and manage partitions for your table.
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int
) PARTITION BY RANGE (logdate);
-- Create a partition for each month of logged data
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
-- Moving older data to a different table
ALTER TABLE measurement DETACH PARTITION measurement_y2005m10;
pg_partman
only supports creating partitions that are number or time-based, with each partition covering a range of values. However, this simplifies the process of creating and managing partitions.
Example: Partitioning user-activity data
Consider a social media platform that tracks user interactions in their website application, such as likes, comments, and shares. The data is stored in a table called user_activities
, where activity_type
stores the type of activity and the other columns store additional information about the activity.
Setting up a partitioned table
Given the large volume of data generated by user interactions, partitioning the user_activities
table can help keep queries manageable. Recent activity data is typically the most interesting for both the platform and its users, so activity_time
is a good candidate to partition on.
CREATE TABLE user_activities (
activity_id serial,
activity_time TIMESTAMPTZ NOT NULL,
activity_type TEXT NOT NULL,
content_id INT NOT NULL,
user_id INT NOT NULL
)
PARTITION BY RANGE (activity_time);
To create a partition for each week of activity data, you can run the following query:
SELECT create_parent('public.user_activities', 'activity_time', '1 week');
This will create a new partition for each week of data in the user_activities
table. We can insert some sample data into the table:
INSERT INTO user_activities (activity_time, activity_type, content_id, user_id)
VALUES
('2024-03-15 10:00:00', 'like', 1001, 101),
('2024-03-16 15:30:00', 'comment', 1002, 102),
('2024-03-17 09:45:00', 'share', 1003, 103),
('2024-03-18 18:20:00', 'like', 1004, 104),
('2024-03-19 12:10:00', 'comment', 1005, 105),
('2024-03-20 08:00:00', 'like', 1006, 106),
('2024-03-21 14:15:00', 'share', 1007, 107),
('2024-03-22 11:30:00', 'like', 1008, 108),
('2024-03-23 16:45:00', 'comment', 1009, 109),
('2024-03-24 20:00:00', 'share', 1010, 110),
('2024-03-25 09:30:00', 'like', 1011, 111),
('2024-03-26 13:45:00', 'comment', 1012, 112),
('2024-03-27 17:00:00', 'share', 1013, 113),
('2024-03-28 11:15:00', 'like', 1014, 114),
('2024-03-29 15:30:00', 'comment', 1015, 115);
Querying partitioned tables
We can query against the user_activities
table as if it were a single table, and Postgres will automatically route the query to the correct partition(s) based on the activity_time
column.
SELECT * FROM user_activities WHERE activity_time BETWEEN '2024-03-20' AND '2024-03-25';
This query returns the following results:
activity_id | activity_time | activity_type | content_id | user_id
-------------+------------------------+---------------+------------+---------
16 | 2024-03-20 08:00:00+00 | like | 1006 | 106
17 | 2024-03-21 14:15:00+00 | share | 1007 | 107
18 | 2024-03-22 11:30:00+00 | like | 1008 | 108
19 | 2024-03-23 16:45:00+00 | comment | 1009 | 109
20 | 2024-03-24 20:00:00+00 | share | 1010 | 110
(5 rows)
To see the list of all partitions created for the user_activities
table, you can run the following query:
SELECT table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name LIKE 'user_activities_%';
This will return the following results:
table_name
---------------------------
user_activities_p20240329
user_activities_p20240405
user_activities_p20240315
user_activities_p20240322
user_activities_p20240412
user_activities_p20240419
user_activities_p20240426
user_activities_default
user_activities_p20240301
user_activities_p20240308
(10 rows)
pg_partman
automatically created tables for weekly intervals close to the current data. As more data is inserted, it will create new partitions. Additionally, there is a user_activities_default
table that stores data that doesn't fit into any of the existing partitions.
Data retention policies
To make sure that old data is automatically removed from the main table, you can set up a retention policy:
UPDATE part_config
SET retention = '4 weeks', retention_keep_table = true
WHERE parent_table = 'public.user_activities';
The background worker process that comes bundled with pg_partman
automatically detaches the old partitions that are older than 4 weeks from the main table. Since we've set retention_keep_table
to true
, the old partitions are kept as separate tables, and not dropped from the database.
Uniqueness constraints
Postgres doesn't support indexes or unique constraints that span multiple tables. Since a partitioned table is made up of multiple physical tables, you can't create a unique constraint that spans all the partitions. For example, the following query will fail:
ALTER TABLE user_activities ADD CONSTRAINT unique_activity UNIQUE (activity_id);
It returns the following error:
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: UNIQUE constraint on table "user_activities" lacks column "activity_time" which is part of the partition key.
However, when the unique constraint involves partition key columns, Postgres can guarantee uniqueness across all partitions. In this way, different partitions cannot share the same values for the partition key columns, which allows unique constraints to be enforced.
For example, including the activity_time
column in the unique constraint will work because activity_time
is a partition key column:
ALTER TABLE user_activities ADD CONSTRAINT unique_activity UNIQUE (activity_id, activity_time);
Conclusion
By leveraging pg_partman
, you can significantly enhance the native partitioning functionality of Postgres, particularly for large-scale and time-series datasets. The extension simplifies partition management, automates retention and archival tasks, and improves query performance.
Reference
Last updated on