Transfers
Understanding the Prequel data transfer logic
How transfers work
Prequel performs transfers by querying the source for a given recipient's data and loading that data into the recipient's destination, on an ongoing basis. The first transfer that runs for a given destination will automatically load all historical data (the "backfill"), and subsequent transfers will attempt to transfer only the data that has changed or been added since the previous transfer.
Prequel transfers from source to destination
- Authorize Source
- Prequel authenticates to Sources using scoped credentials or delegated roles created by the user. Prequel validates connectivity, and restricts permissions to only what is needed to read the configured models for the intended recipient to ensure least-privilege access with clear auditability.
- Read, Batch, and Serialize
- Data is read in a sliding window based on time. Each transfer moves a window of data, starting from a checkpoint based on the last batch of data transferred to ensure data integrity and efficient transfers at scale. When available, Prequel uses a source staging bucket to temporarily store the results of queries as files in object storage which are then downloaded and normalized.
- Prequel uses a lookback window to ensure resiliency against eventual consistency concerns in data sources. For more detail on its mechanics, see: Change Detection
- Authorize Destination
- Prequel authenticates to your customer's destinations using destination-native authentication scoped to the target schemas/tables for isolation and least‑privilege access aligned with destination security.
- Load to Destination
- Staging-assisted loads: Batches are uploaded to a staging area (for example, a native volume or storage bucket) and then ingested using the destination's bulk-load path. Data is normalized before staging. Prequel's transfer logic is designed uniquely for each destination type to maximize throughput and leverage vendor-optimized patterns.
- Direct inserts: For destinations that don't support staging assisted loads, batches are streamed directly via insert SQL queries or API calls without external staging. As a reusult, these destination types can has throughput limitations - contact the Prequel team to learn more about data volumes and throughput across destination types.
- Prequel uses upserts with changes matched on primary key and duplicates resolved via the last modified timestamp to ensure data integrity and protect table state.
- With a Write-Ahead-Publish architecture, your customer never sees data before a transfer is complete and all data is available in the destination.
- Staging files created during transfers are automatically cleaned up after transfer completion. Data is not persisted in the staging area after transfer.
- With each transfer, metadata is written to each destination per transfer. For object storage locations, see Manifest files for object storage and for warehouses, see Data warehouses & databases.
- Transparency and controls
- Each phase emits structured logs and metrics for governance and auditability. Tags can be used to label transfers for filtering and reporting.
Transfer Lifecycle
Transfers are managed by an internal queue, which is used to dispatch transfers to workers. When a destination has the enabled
flag set to true, Prequel will automatically enqueue transfers for that destination based on the frequency
value of the destination or the organization's default frequency.
A transfer resource always has a status corresponding to its current phase of the lifecycle:
- PENDING: Transfers start as pending by default when they are created (enqueued). The
submitted_at
timestamp on the transfers resource corresponds to when the transfer was enqueued. - RUNNING: A transfer is running after it has been dispatched to a worker. The
started_at
timestamp on the transfers resource corresponds to when the transfer changed toRUNNING
. - ERROR: A transfer is marked with an error if there is an issue dispatching the transfer, if the worker fails to connect to the source or destination, or if all the models fail to transfer. The
ended_at
timestamp on the transfers resource corresponds to when the transfer changed toERROR
. - PARTIAL_FAILURE: A transfer is considered a partial failure if it reaches the running state but only some models succeed while others fail. The
ended_at
timestamp on the transfers resource corresponds to when the transfer changed toPARTIAL_FAILURE
. - SUCCESS: A transfer is successful if it was running and all models transfer without issues. The
ended_at
timestamp on the transfers resource corresponds to when the transfer changed toSUCCESS
. - CANCELLED: A transfer is cancelled if a user terminates it before it starts running.
- KILLED: A transfer is killed if a user terminates it while it is running.
- EXPIRED: A transfer becomes expired if it is blocked from being dispatched and remains pending for longer than 6 hours.
- ORPHANED: A transfer becomes orphaned if the worker dies ungracefully or stops communicating with the control plane.
Backfills & full refreshes
The initial transfer (or "backfill"), is often the largest transfer by volume. During this initial sync, all historical data for a given recipient is loaded into the destination.
If, for any reason, a destination needs to be reset (e.g., a destination admin accidentally drops the table), you can trigger a full refresh by adding the "full_refresh": true
parameter to a transfer request. This will backfill the entire table as if it were the first transfer.
Backfill vs. incremental transfer performance
Because the initial backfill is often the most storage and compute intensive, sync time/performance should not be used as an indicator of ongoing transfer statistics.
Table Reset Behavior:
During full refreshes, Prequel determines whether to truncate the existing table or drop and recreate it based on schema compatibility:
- Truncate: If the existing table's schema matches the model (i.e., column types and structures are compatible), Prequel will truncate the table to remove existing data before inserting new data.
- Drop & Recreate: If there's a schema mismatch (e.g., differing column types or structures), Prequel will drop the existing table and recreate it with the correct schema before inserting data.
This logic maximizes efficiency of full refreshes and standardizes logic across all destinations.
Incremental transfers
After each transfer (backfill or incremental) Prequel will record the most recent updated_at
value that was transferred. This value will be used as the starting point for the subsequent transfer.
By default, every transfer of a given model (after a successful backfill) will be an "incremental transfer".
Incremental updates and eventually consistent data sources
By default, Prequel will query the source for slightly earlier data than the most recently transferred row. This is to provide a window in which data from eventually consistent sources can converge and still be transferred.
Transfer Parallelism and Concurrency
Transfer Concurrency:
Within an individual transfer, operations are optimistically concurrent. Transfers can download, upload, or serialize multiple data files concurrently, regardless of the model to which they belong. The max_concurrent_queries_per_transfer
field on a source or destination limits the number of concurrently queries or API calls that can be made against the source or destination. The default for max_concurrent_queries_per_transfer
is 1
.
Transfers Parallelism:
Transfers can run in parallel of each other as long as the following constraints hold:
- No simultaneous transfers are allowed for the same model to the same destination.
- No simultaneous integrity and transfer jobs can run against the same destination.
- The
max_concurrent_transfers
field exists on both the source and destination. It defaults to10
for sources and1
for destinations. This field represents a hard limit on the number of simultaneous transfers involving a particular source or destination.
Prequel's dispatcher will enforce the above rules. A transfer that is unable to be dispatched will remain pending until it can be dispatched.
Required columns on source table
Required column | Description |
---|---|
Unique ID (e.g., id ) | Every table to be transferred will need a primary key column (e.g., an id column) to facilitate UPDATE /INSERT ("upsert") operations on the destination. |
Last modified (e.g. updated_at ) | Every table to be transferred will need to be configured with a column to indicate when it was last modified (i.e., an updated_at column). This column should contain timestamp data and will be used by Prequel to identify changes between transfers. |
Tenant ID (e.g., organization_id ) | Every source table will need some way to indicate its recipient. Prequel supports two tenancy modes: multi-tenant tables and schema-tenanted databases. For multi-tenant source tables, Prequel requires an organization_id column to filter the source data by tenant ID. To read more about the different tenancy modes, you can read the multi-tenancy docs. |
Tags
Every transfer can carry an arbitrary set of tags—simple key/value metadata you define to group, label, or annotate transfers (for example, by environment, team, workload, etc.). Tag keys and values must match ^[A-Za-z0-9_-]+$
.
For more detail on how to use Tags when creating transfers and filtering transfers on tags, refer to our API Reference.
Staging buckets
Some sources and destinations supported by Prequel may require staging buckets to efficiently transfer data. Where possible, Prequel will use built in staging resourced provided by the database or data warehouse, but in cases where it does not exist, it may need to be provided. The source/destination documentation will provide instructions for configuring staging buckets where needed.
Safeguarding user data
As a matter of security and compliance, Prequel does not store nor retain any of the data it transfers. Transferred data only lives within the ephemeral worker tasked with running a specific transfer for the duration of the transfer and up to 24hrs afterwards. These workers are sandboxed from each other; a dedicated worker is spun up for each transfer and wound down afterwards. In order to facilitate incremental transfers, Prequel does store the timestamp corresponding to the most recent last_updated_at
value for each transfer run. We consider this to be safe metadata rather than user data.
Updated 3 days ago