Transfer logic
Understanding the Prequel data transfer logic
How transfers work
Prequel performs transfers by querying the source for a given recipient's data and loading that data into the recipient's destination, on an ongoing basis. The first transfer that runs for a given destination will automatically load all historical data (the "backfill"), and subsequent transfers will attempt to transfer only the data that has changed or been added since the previous transfer.
Backfills & full refreshes
The initial transfer (or "backfill"), is often the largest transfer by volume. During this initial sync, all historical data for a given recipient is loaded into the destination.
If, for any reason, a destination needs to be reset (e.g., a destination admin accidentally drops the table), you can trigger a full refresh by adding the "full_refresh": true
parameter to a transfer request. This will backfill the entire table as if it were the first transfer.
Backfill vs. incremental transfer performance
Because the initial backfill is often the most storage and compute intensive, sync time/performance should not be used as an indicator of ongoing transfer statistics.
Incremental transfers
After each transfer (backfill or incremental) Prequel will record the most recent updated_at
value that was transferred. This value will be used as the starting point for the subsequent transfer.
By default, every transfer of a given model (after a successful backfill) will be an "incremental transfer".
Incremental updates and eventually consistent data sources
By default, Prequel will query the source for slightly earlier data than the most recently transferred row. This is to provide a window in which data from eventually consistent sources can converge and still be transferred.
Required columns on source table
Required column | Description |
---|---|
Unique ID (e.g., id ) | Every table to be transferred will need a primary key column (e.g., an id column) to facilitate UPDATE /INSERT ("upsert") operations on the destination. |
Last modified (e.g. updated_at ) | Every table to be transferred will need to be configured with a column to indicate when it was last modified (i.e., an updated_at column). This column should contain timestamp data and will be used by Prequel to identify changes between transfers. |
Tenant ID (e.g., organization_id ) | Every source table will need some way to indicate its recipient. Prequel supports two tenancy modes: multi-tenant tables and schema-tenanted databases. For multi-tenant source tables, Prequel requires an organization_id column to filter the source data by tenant ID. To read more about the different tenancy modes, you can read the multi-tenancy docs. |
Staging buckets
Some sources and destinations supported by Prequel may require staging buckets to efficiently transfer data. Where possible, Prequel will use built in staging resourced provided by the database or data warehouse, but in cases where it does not exist, it may need to be provided. The source/destination documentation will provide instructions for configuring staging buckets where needed.
Updated 4 months ago