DocumentationAPI Reference
Documentation

Transfer logic

Understanding the Prequel data transfer logic

How transfers work

Prequel performs transfers by querying the source for a given recipient's data and loading that data into the recipient's destination, on an ongoing basis. The first transfer that runs for a given destination will automatically load all historical data (the "backfill"), and subsequent transfers will attempt to transfer only the data that has changed or been added since the previous transfer.

Transfer Lifecycle

Transfers are managed by an internal queue, which is used to dispatch transfers to workers. When a destination has the enabled flag set to true, Prequel will automatically enqueue transfers for that destination based on the frequency value of the destination or the organization's default frequency.

A transfer resource always has a status corresponding to its current phase of the lifecycle:

  • PENDING: Transfers start as pending by default when they are created (enqueued). The submitted_at timestamp on the transfers resource corresponds to when the transfer was enqueued.
  • RUNNING: A transfer is running after it has been dispatched to a worker. The started_at timestamp on the transfers resource corresponds to when the transfer changed to RUNNING.
  • ERROR: A transfer is marked with an error if there is an issue dispatching the transfer, if the worker fails to connect to the source or destination, or if all the models fail to transfer. The ended_at timestamp on the transfers resource corresponds to when the transfer changed to ERROR.
  • PARTIAL_FAILURE: A transfer is considered a partial failure if it reaches the running state but only some models succeed while others fail. The ended_at timestamp on the transfers resource corresponds to when the transfer changed to PARTIAL_FAILURE.
  • SUCCESS: A transfer is successful if it was running and all models transfer without issues. The ended_at timestamp on the transfers resource corresponds to when the transfer changed to SUCCESS.
  • CANCELLED: A transfer is cancelled if a user terminates it before it starts running.
  • KILLED: A transfer is killed if a user terminates it while it is running.
  • EXPIRED: A transfer becomes expired if it is blocked from being dispatched and remains pending for longer than 6 hours.
  • ORPHANED: A transfer becomes orphaned if the worker dies ungracefully or stops communicating with the control plane.

Backfills & full refreshes

The initial transfer (or "backfill"), is often the largest transfer by volume. During this initial sync, all historical data for a given recipient is loaded into the destination.

If, for any reason, a destination needs to be reset (e.g., a destination admin accidentally drops the table), you can trigger a full refresh by adding the "full_refresh": true parameter to a transfer request. This will backfill the entire table as if it were the first transfer.

📘

Backfill vs. incremental transfer performance

Because the initial backfill is often the most storage and compute intensive, sync time/performance should not be used as an indicator of ongoing transfer statistics.

Incremental transfers

After each transfer (backfill or incremental) Prequel will record the most recent updated_at value that was transferred. This value will be used as the starting point for the subsequent transfer.

By default, every transfer of a given model (after a successful backfill) will be an "incremental transfer".

📘

Incremental updates and eventually consistent data sources

By default, Prequel will query the source for slightly earlier data than the most recently transferred row. This is to provide a window in which data from eventually consistent sources can converge and still be transferred.

Transfer Parallelism and Concurrency

Transfer Concurrency:
Within an individual transfer, operations are optimistically concurrent. Transfers can download, upload, or serialize multiple data files concurrently, regardless of the model to which they belong. The max_concurrent_queries_per_transfer field on a source or destination limits the number of concurrently queries or API calls that can be made against the source or destination. The default for max_concurrent_queries_per_transfer is 1.

Transfers Parallelism:
Transfers can run in parallel of each other as long as the following constraints hold:

  • No simultaneous transfers are allowed for the same model to the same destination.
  • No simultaneous integrity and transfer jobs can run against the same destination.
  • The max_concurrent_transfers field exists on both the source and destination. It defaults to 10 for sources and 1 for destinations. This field represents a hard limit on the number of simultaneous transfers involving a particular source or destination.

Prequel's dispatcher will enforce the above rules. A transfer that is unable to be dispatched will remain pending until it can be dispatched.

Required columns on source table

Required columnDescription
Unique ID
(e.g., id)
Every table to be transferred will need a primary key column (e.g., an id column) to facilitate UPDATE/INSERT ("upsert") operations on the destination.
Last modified
(e.g. updated_at)
Every table to be transferred will need to be configured with a column to indicate when it was last modified (i.e., an updated_at column). This column should contain timestamp data and will be used by Prequel to identify changes between transfers.
Tenant ID
(e.g., organization_id)
Every source table will need some way to indicate its recipient. Prequel supports two tenancy modes: multi-tenant tables and schema-tenanted databases. For multi-tenant source tables, Prequel requires an organization_id column to filter the source data by tenant ID. To read more about the different tenancy modes, you can read the multi-tenancy docs.

Staging buckets

Some sources and destinations supported by Prequel may require staging buckets to efficiently transfer data. Where possible, Prequel will use built in staging resourced provided by the database or data warehouse, but in cases where it does not exist, it may need to be provided. The source/destination documentation will provide instructions for configuring staging buckets where needed.

Safeguarding user data

As a matter of security and compliance, Prequel does not store nor retain any of the data it transfers. Transferred data only lives within the ephemeral worker tasked with running a specific transfer for the duration of the transfer and up to 24hrs afterwards. These workers are sandboxed from each other; a dedicated worker is spun up for each transfer and wound down afterwards. In order to facilitate incremental transfers, Prequel does store the timestamp corresponding to the most recent last_updated_at value for each transfer run. We consider this to be safe metadata rather than user data.