How transfers work
Prequel performs transfers by querying the source for a given recipient’s data and loading that data into the recipient’s destination, on an ongoing basis. The first transfer that runs for a given destination will automatically load all historical data (the “backfill”), and subsequent transfers will attempt to transfer only the data that has changed or been added since the previous transfer.Prequel transfers from source to destination
Authorize source
Prequel authenticates to Sources using scoped credentials or delegated roles created by the user. Prequel validates connectivity, and restricts permissions to only what is needed to read the configured models for the intended recipient for least-privilege access with clear auditability.
Read, batch, and serialize
Data is read in a sliding window based on time. Each transfer moves a window of data, starting from a checkpoint based on the last batch of data transferred to ensure data integrity and efficient transfers at scale. When available, Prequel uses a source staging bucket to temporarily store the results of queries as files in object storage which are then downloaded and normalized.Prequel uses a lookback window to ensure resiliency against eventual consistency concerns in data sources. For more detail on its mechanics, see Change detection.
Authorize destination
Prequel authenticates to your customer’s destinations using destination-native authentication scoped to the target schemas/tables for isolation and least-privilege access aligned with destination security.
Load to destination
- Staging-assisted loads: Batches are uploaded to a staging area (for example, a native volume or storage bucket) and then ingested using the destination’s bulk-load path. Data is normalized before staging. Prequel’s transfer logic is designed uniquely for each destination type to maximize throughput and leverage vendor-optimized patterns.
- Direct inserts: For destinations that don’t support staging-assisted loads, batches are streamed directly via insert SQL queries or API calls without external staging. As a result, these destination types can have throughput limitations; contact the Prequel team to learn more about data volumes and throughput across destination types.
- Prequel uses upserts with changes matched on primary key and duplicates resolved via the last modified timestamp to ensure data integrity and protect table state.
- With a Write-Ahead-Publish architecture, your customer never sees data before a transfer is complete and all data is available in the destination.
- Staging files created during transfers are automatically cleaned up after transfer completion. Data is not persisted in the staging area after transfer.
- With each transfer, metadata is written to each destination per transfer. For object storage locations, see Manifest files for object storage, and for warehouses and databases, see Transfer status table.
Transfer lifecycle
Transfers are managed by an internal queue, which is used to dispatch transfers to workers. When a destination has theenabled flag set to true, Prequel will automatically enqueue transfers for that destination based on the frequency value of the destination or the organization’s default frequency.
A transfer resource always has a status corresponding to its current phase of the lifecycle:
| Status | Description |
|---|---|
PENDING | Transfers start as pending when they are created (enqueued). The submitted_at timestamp records when the transfer was enqueued. |
RUNNING | The transfer has been dispatched to a worker. The started_at timestamp records when it changed to RUNNING. |
ERROR | There was an issue dispatching the transfer, the worker failed to connect to the source or destination, or all models failed to transfer. ended_at records the change to ERROR. |
PARTIAL_FAILURE | The transfer reached the running state, but only some models succeeded while others failed. ended_at records the change to PARTIAL_FAILURE. |
SUCCESS | The transfer was running and all models transferred without issues. ended_at records the change to SUCCESS. |
CANCELLED | A user terminated the transfer before it started running. |
KILLED | A user terminated the transfer while it was running. |
EXPIRED | The transfer was blocked from being dispatched and remained pending for longer than 6 hours. |
ORPHANED | The worker died ungracefully or stopped communicating with the control plane. |
Backfills & full refreshes
The initial transfer (or “backfill”), is often the largest transfer by volume. During this initial sync, all historical data for a given recipient is loaded into the destination. To trigger a full refresh manually, add"full_refresh": true to a transfer request. Prequel only triggers a full refresh automatically on the first transfer, either to a new destination or a new model.
Backfill vs. incremental transfer performanceBecause the initial backfill is often the most storage and compute intensive, sync time/performance should not be used as an indicator of ongoing transfer statistics.
- Truncate: If the schema matches, the table is truncated before reloading data.
- Drop & Recreate: If there is a schema mismatch, the table is dropped and recreated with the correct schema.
- When recommended
- When not recommended
- A customer accidentally drops or overwrites one or more tables in their destination system.
- A new column was added and historical data needs to be backfilled.
Incremental transfers
After each transfer (backfill or incremental) Prequel will record the most recent last modified timestamp value transferred. This value will be used as the starting point for the subsequent transfer. By default, every transfer of a given model (after a successful backfill) will be an “incremental transfer”.Incremental updates and eventually consistent data sourcesBy default, Prequel will query the source for slightly earlier data than the most recently transferred row. This is to provide a window in which data from eventually consistent sources can converge and still be transferred.
Transfer parallelism and concurrency
Transfer Concurrency: Within an individual transfer, operations are optimistically concurrent. Transfers can download, upload, or serialize multiple data files concurrently, regardless of the model to which they belong. Themax_concurrent_queries_per_transfer field on a source or destination limits the number of concurrently queries or API calls that can be made against the source or destination. The default for max_concurrent_queries_per_transfer is 1.
Transfers Parallelism:
Transfers can run in parallel of each other as long as the following constraints hold:
- No simultaneous transfers are allowed for the same model to the same destination.
- No simultaneous integrity and transfer jobs can run against the same destination.
- The
max_concurrent_transfersfield exists on both the source and destination. It defaults to10for sources and1for destinations. This field represents a hard limit on the number of simultaneous transfers involving a particular source or destination.
Tags
Every transfer can carry an arbitrary set of tags, simple key/value metadata you define to group, label, or annotate transfers (for example, by environment, team, workload, etc.). Tag keys and values must match^[A-Za-z0-9_-]+$.
For more detail on how to use Tags when creating transfers and filtering transfers on tags, refer to our API Reference.