Stream data from Neon to external data sources via logical replication
Send data to Airbyte, Fivetran, Kafka, and others for real-time analytics and Change Data Capture
Outbound logical replication is available in beta for all Neon projects (outbound = from Neon to other sources). Inbound logical replication (from other sources to Neon) will be available soon. If you want to try it early, tell us here or join our Early Access list.
Logical replication allows you to replicate data from your Neon database to various external systems, enabling Change Data Capture (CDC) and real-time analytics. You can use logical replication to stream data to data warehouses, analytical databases, messaging platforms, event-streaming platforms, and other Postgres databases.
In this blog post, we introduce you to the main concepts and use cases for logical replication. We provide how-to guides for starting to replicate data from Neon to platforms like Airbyte, Confluent, Materialize, Fivetran, and Decodable.
Postgres logical replication: crash course
What is logical replication?
Logical replication allows you to mirror data between a Postgres database and another data source. When you enable logical replication in a Postgres database (a “publisher”), every insert, update, and delete operation is captured and streamed in real-time to another database or data source (the “subscriber”). This ensures that the data in both sources remains synchronized.
What logical replication is not
When starting with logical replication, it’s important to understand that it focuses exclusively on replicating data changes (inserts, updates, deletes), not DDL operations like schema changes or other configuration changes.
In practice, this means if you make changes to the schema in the publisher database (e.g., adding a new column, modifying a table structure, or creating a new index), you will need to manually apply these changes to the subscriber database. The same applies to configuration settings.
Benefits of logical replication
Even with this limitation, logical replication is useful for many reasons:
- Selective data sync. Logical replication allows you to choose specific tables or even individual rows for replication: you can replicate only the data you need, avoiding unnecessary duplication and saving resources. For example, an e-commerce platform can replicate only sales and inventory tables to a reporting database for analytics.
- Real-time data replication. Any changes made in the source database are propagated to the target database almost immediately, ensuring your data is always current in both the publisher and subscriber. This is great for customer-facing applications that require real-time data updates, such as new orders or user activity.
- Production migrations without downtime. Logical replication can be your best friend while migrating a large production database between different database vendors. Since you can’t shut things off, you can rely on logical replication to ensure that data is continuously synchronized between the old and new databases during the migration. Once everything is ready, you can switch connections.
- Safer Postgres upgrades in large databases. Similarly, logical replication can also be useful when handling upgrades of large Postgres databases, ensuring a smooth transition without downtime.
How is logical replication different vs pg_dump/restore?
Both logical replication and pg_dump/restore are used for “copying data” between databases, but they are significantly different. Pg_dump creates static snapshots of the database or selected tables, but it doesn’t keep the target database up-to-date with ongoing changes. All data is loaded into the source database at once, and depending on the dataset size, this might take time.
In contrast, logical replication continuously streams individual data changes from the publisher to the subscriber in real-time. When you enable logical replication, Postgres uses its own mechanisms to create an initial snapshot of the data; once this initial data synchronization is complete, logical replication then streams changes as they occur, keeping the subscriber always up-to-date with the latest changes from the publisher database.
When to use pg_dump/restore or logical replication?
This depends on your needs. Some scenarios require both data sources to be completely in sync, while in others, this might not be necessary or desirable.
How logical replication works in Postgres
📚 Check out our documentation on logical replication for a deeper dive.
Logical replication in Neon works similarly to standard Postgres, using a publish-subscribe model where all changes are tracked via Postgres’ Write-Ahead Log (WAL).
- The publisher, a Neon database, “publishes” data changes to the subscriber, the data’s destination.
- When a subscriber first connects, it takes a snapshot of the data from the publisher and copies it. After this initial transfer, any subsequent changes in the publisher database are sent to the subscriber in real-time.
- WAL records every change made to the data in Postgres databases (WAL plays a core role in the Neon architecture). During logical replication, WAL records all data changes in the publisher, which are then decoded and sent to the subscriber.
- Decoder plugins convert WAL entries into a format that can be understood and processed by the subscriber. These plugins transform WAL entries into logical change records, creating a logical replication stream that the subscriber processes and applies to its dataset.
To ensure data consistency and prevent potential data loss, Postgres uses replication slots. When setting up logical replication, a replication slot is typically created on the publisher database. This slot ensures that the WAL records, containing all changes made to the database, are not deleted until safely replicated to all subscribers. This guarantees that no changes are lost, even if there are network issues or the subscriber is temporarily unavailable.
Setting up logical replication in Neon
By now, Neon supports outbound logical replication in beta (Neon as the publisher). We’re working on supporting inbound logical replication soon (Neon as a subscriber). If you are interested in trying it out, tell us here or join our Early Access list.
Enabling logical replication in Neon is done through the Console. Take into account that once enabled, all computes in your project are restarted, meaning active connections will be dropped and need to reconnect.
To enable logical replication in a Neon project, navigate to your project settings, select Logical Replication, and click Enable:
You can verify that logical replication is enabled by running:
Replication roles
Once you enable logical replication, check that the right roles have the 1REPLICATION1 privilege. The default Postgres role and roles created via the Neon Console, CLI, or API have this privilege; roles created via SQL do not, and this privilege cannot be granted manually.
To verify that a role has the REPLICATION
privilege, run:
Publications
Publications define which tables and changes are replicated from the publisher to the subscribers. To create a publication, use the CREATE PUBLICATION
command. For example, to replicate changes in the users
table, you would use:
You could also create a publication that only publishes specific operations, such as inserts and updates:
To add or remove tables from a publication, you can use the ALTER PUBLICATION
command:
If you need to remove a publication entirely, you would use the DROP PUBLICATION
command:
Understanding replication slots
As we mentioned earlier, replication slots are crucial for ensuring data consistency and preventing data loss. You can create a replication slot manually using:
To remove a replication slot, you can use:
In Neon, inactive replication slots are automatically removed after 75 minutes if other active slots exist. This helps prevent storage bloat. If your setup requires a longer inactivity period, reach out to us so we can modify this for you.
📚 Check out our docs for detailed instructions on setting up logical replication
Get started with logical replication guides
To help you get started with logical replication in Neon, we have a series of guides with step-by-step instructions to stream data from your Neon database to different external platforms and services, including:
Airbyte
Airbyte is an open-source data integration platform that moves data from a source to a destination system. Check out this guide to replicate data from a Neon database to Airbyte.
Bemi
Bemi is an open-source solution that plugs into Postgres and ORMs such as Prisma, TypeORM, SQLAlchemy, and Ruby on Rails to track database changes automatically. Check out this guide to replicate data from a Neon database to Bemi.
DoubleCloud
DoubleCloud is a managed data platform that helps engineering teams build data infrastructure with open-source technologies. Check out this guide to replicate data from a Neon database to ClickHouse via DoubleCloud.
Fivetran
Fivetran is a data pipeline platform that helps you centralize data from disparate sources. Check out this guide to replicate data from a Neon database to Fivetran.
Kafka via Confluent
Confluent is a fully managed, cloud-native real-time data streaming service built on Apache Kafka. Check out this guide to replicate data from a Neon database to Confluent.
Materialize
Materialize is a data warehouse for operational workloads, purpose-built for low-latency applications. Check out this guide to replicate data from a Neon database to Materialize.
Decodable
Decodable is a fully managed platform for ETL, ELT, and stream processing, powered by Apache Flink® and Debezium. Check out this guide to replicate data from a Neon database to Decodable.
Sequin
Sequin is a serverless messaging stream for API integrations. Check out this guide to replicate data from a Neon database to Sequin.
Try it: create a Neon account for free
You can get started with logical replication and Neon right away. Create an account in our Free Plan, no credit card required. And keep an eye on our docs for more logical replication guides to come!