Transfer logic

Understanding the Prequel data transfer logic

How transfers work

Prequel performs transfers by querying the source for a given recipient's data and loading that data into the recipient's destination, on an ongoing basis. The first transfer that runs for a given destination will automatically load all historical data (the "backfill"), and subsequent transfers will attempt to transfer only the data that has changed or been added since the previous transfer.

Backfills & full refreshes

The initial transfer (or "backfill"), is often the largest transfer by volume. During this initial sync, all historical data for a given recipient is loaded into the destination.

If, for any reason, a destination needs to be reset (e.g., a destination admin accidentally drops the table), you can trigger a full refresh by adding the "full_refresh": true parameter to a transfer request. This will backfill the entire table as if it were the first transfer.

πŸ“˜

Backfill vs. incremental transfer performance

Because the initial backfill is often the most storage and compute intensive, sync time/performance should not be used as an indicator of ongoing transfer statistics.

Incremental transfers

After each transfer (backfill or incremental) Prequel will record the most recent updated_at value that was transferred. This value will be used as the starting point for the subsequent transfer.

By default, every transfer of a given model (after a successful backfill) will be an "incremental transfer".

πŸ“˜

Incremental updates and eventually consistent data sources

By default, Prequel will query the source for slightly earlier data than the most recently transferred row. This is to provide a window in which data from eventually consistent sources can converge and still be transferred.

Required columns on source table

Required columnDescription
Unique ID
(e.g., id)
Every table to be transferred will need a primary key column (e.g., an id column) to facilitate UPDATE/INSERT ("upsert") operations on the destination.
Last modified
(e.g. updated_at)
Every table to be transferred will need to be configured with a column to indicate when it was last modified (i.e., an updated_at column). This column should contain timestamp data and will be used by Prequel to identify changes between transfers.
Tenant ID
(e.g., organization_id)
Every source table will need some way to indicate its recipient. Prequel supports two tenancy modes: multi-tenant tables and schema-tenanted databases. For multi-tenant source tables, Prequel requires an organization_id column to filter the source data by tenant ID. To read more about the different tenancy modes, you can read the multi-tenancy docs.

Staging buckets

Some sources and destinations supported by Prequel may require staging buckets to efficiently transfer data. Where possible, Prequel will use built in staging resourced provided by the database or data warehouse, but in cases where it does not exist, it may need to be provided. The source/destination documentation will provide instructions for configuring staging buckets where needed.