> ## Documentation Index
> Fetch the complete documentation index at: https://docs.prequel.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Transfers

> Understanding the Prequel data transfer logic

## How transfers work

Prequel performs transfers by querying the source for a given recipient's data and loading that data into the recipient's destination, on an ongoing basis. The first transfer that runs for a given destination will automatically load all historical data (the "backfill"), and subsequent transfers will attempt to transfer only the data that has changed or been added since the previous transfer.

### Prequel transfers from source to destination

<Steps>
  <Step title="Authorize source">
    Prequel authenticates to [Sources](/export/concepts/sources) using scoped credentials or delegated roles created by the user. Prequel validates connectivity, and restricts permissions to only what is needed to read the configured models for the intended recipient for **least-privilege access with clear auditability.**
  </Step>

  <Step title="Read, batch, and serialize">
    Data is read in a sliding window based on time. Each transfer moves a window of data, starting from a checkpoint based on the last batch of data transferred to ensure **data integrity and efficient transfers at scale**. When available, Prequel uses a source staging bucket to temporarily store the results of queries as files in object storage which are then downloaded and normalized.

    Prequel uses a lookback window to ensure resiliency against **eventual consistency** concerns in data sources. For more detail on its mechanics, see [Change detection](/export/features/change-detection#eventual-consistency).
  </Step>

  <Step title="Authorize destination">
    Prequel authenticates to your customer's destinations using destination-native authentication scoped to the target schemas/tables for **isolation and least-privilege access aligned with destination security.**
  </Step>

  <Step title="Load to destination">
    * **Staging-assisted loads**: Batches are uploaded to a staging area (for example, a native volume or storage bucket) and then ingested using the destination's bulk-load path. Data is normalized before staging. Prequel's transfer logic is designed uniquely for each destination type to **maximize throughput and leverage vendor-optimized patterns.**
    * **Direct inserts**: For destinations that don't support staging-assisted loads, batches are streamed directly via insert SQL queries or API calls without external staging. As a result, these destination types can have throughput limitations; contact the Prequel team to learn more about data volumes and throughput across destination types.
    * Prequel uses upserts with changes matched on primary key and duplicates resolved via the last modified timestamp to ensure **data integrity and protect table state**.
    * With a [Write-Ahead-Publish](https://lakefs.io/blog/data-engineering-patterns-write-audit-publish/) architecture, your customer never sees data before a transfer is complete and all data is available in the destination.
    * Staging files created during transfers are **automatically cleaned up after transfer completion**. Data is not persisted in the staging area after transfer.
    * With each transfer, metadata is written to each destination per transfer. For object storage locations, see [Manifest files for object storage](/export/features/manifest-files-for-object-storage), and for warehouses and databases, see [Transfer status table](/export/features/transfer-status-table).
  </Step>

  <Step title="Transparency and controls">
    Each phase emits structured [logs](/export/monitoring/monitoring) and [metrics](/export/features/usage-data) for **governance and auditability**. Tags can be used to label transfers for filtering and reporting.
  </Step>
</Steps>

## Transfer lifecycle

Transfers are managed by an internal queue, which is used to dispatch transfers to workers. When a destination has the `enabled` flag set to `true`, Prequel will automatically enqueue transfers for that destination based on the `frequency` value of the destination or the organization's default frequency.

A transfer resource always has a status corresponding to its current phase of the lifecycle:

| Status            | Description                                                                                                                                                                         |
| :---------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `PENDING`         | Transfers start as pending when they are created (enqueued). The `submitted_at` timestamp records when the transfer was enqueued.                                                   |
| `RUNNING`         | The transfer has been dispatched to a worker. The `started_at` timestamp records when it changed to `RUNNING`.                                                                      |
| `ERROR`           | There was an issue dispatching the transfer, the worker failed to connect to the source or destination, or all models failed to transfer. `ended_at` records the change to `ERROR`. |
| `PARTIAL_FAILURE` | The transfer reached the running state, but only some models succeeded while others failed. `ended_at` records the change to `PARTIAL_FAILURE`.                                     |
| `SUCCESS`         | The transfer was running and all models transferred without issues. `ended_at` records the change to `SUCCESS`.                                                                     |
| `CANCELLED`       | A user terminated the transfer before it started running.                                                                                                                           |
| `KILLED`          | A user terminated the transfer while it was running.                                                                                                                                |
| `EXPIRED`         | The transfer was blocked from being dispatched and remained pending for longer than 6 hours.                                                                                        |
| `ORPHANED`        | The worker died ungracefully or stopped communicating with the control plane.                                                                                                       |

## Backfills & full refreshes

The initial transfer (or "backfill"), is often the largest transfer by volume. During this initial sync, all historical data for a given recipient is loaded into the destination.

To trigger a full refresh manually, add `"full_refresh": true` to a [transfer request](/export/api-reference/transfers/create-transfer). Prequel only triggers a full refresh automatically on the first transfer, either to a new destination or a new model.

<Warning>
  **Data impact varies by destination type**

  **Warehouses, databases, and open table format (OLAP, OLTP, OTF):** All existing data is deleted before reloading. If your source retains only partial history (e.g., a 90-day rolling window), **data outside that range will be lost**. Any date filters explicitly configured in a model's source query **also still apply**.

  **Object storage (non-OTF) & SFTP:** Existing files are not deleted, and a full refresh will produce duplicate data.
</Warning>

<Note>
  **Backfill vs. incremental transfer performance**

  Because the initial backfill is often the most storage and compute intensive, sync time/performance should not be used as an indicator of ongoing transfer statistics.
</Note>

**Table Reset Behavior:**

For warehouse and database destinations, Prequel determines whether to truncate or drop and recreate the table based on schema compatibility:

* **Truncate:** If the schema matches, the table is truncated before reloading data.
* **Drop & Recreate:** If there is a schema mismatch, the table is dropped and recreated with the correct schema.

<Tabs>
  <Tab title="When recommended" icon="circle-check">
    - A customer accidentally drops or overwrites one or more tables in their destination system.
    - A new column was added and historical data needs to be backfilled.
  </Tab>

  <Tab title="When not recommended" icon="circle-xmark">
    * **Transfer ended in a non-`SUCCESS` status (`ERROR`, `PARTIAL_FAILURE`, `ORPHANED`):** Prequel tracks sync checkpoints per-model. Any model that did not complete successfully will automatically resume from its last successful checkpoint during the next incremental transfer.
    * **Adding a new model:** Prequel triggers the initial backfill automatically.
    * **Non-persisted data in source:** If your source does not preserve full customer data history, a full refresh transfer will result in data loss.
  </Tab>
</Tabs>

If you're unsure whether a full refresh fits your situation, contact Prequel support to discuss your use case.

## Incremental transfers

After each transfer (backfill or incremental) Prequel will record the most recent last modified timestamp value transferred. This value will be used as the starting point for the subsequent transfer.

By default, every transfer of a given model (after a successful backfill) will be an "incremental transfer".

<Note>
  **Incremental updates and eventually consistent data sources**

  By default, Prequel will query the source for slightly earlier data than the most recently transferred row. This is to provide a window in which data from eventually consistent sources can converge and still be transferred.
</Note>

## Transfer parallelism and concurrency

**Transfer Concurrency:**
Within an individual transfer, operations are optimistically concurrent. Transfers can download, upload, or serialize multiple data files concurrently, regardless of the model to which they belong. The `max_concurrent_queries_per_transfer` field on a [source](/export/api-reference/sources/create-source) or [destination](/export/api-reference/destinations/create-destination) limits the number of concurrently queries or API calls that can be made against the source or destination. The default for `max_concurrent_queries_per_transfer` is `1`.

**Transfers Parallelism:**
Transfers can run in parallel of each other as long as the following constraints hold:

* No simultaneous transfers are allowed for the same model to the same destination.
* No simultaneous integrity and transfer jobs can run against the same destination.
* The `max_concurrent_transfers` field exists on both the source and destination. It defaults to `10` for sources and `1` for destinations. This field represents a hard limit on the number of simultaneous transfers involving a particular source or destination.

Prequel's dispatcher will enforce the above rules. A transfer that is unable to be dispatched will remain pending until it can be dispatched.

## Tags

Every transfer can carry an arbitrary set of **tags**, simple key/value metadata you define to group, label, or annotate transfers (for example, by environment, team, workload, etc.). Tag keys and values must match `^[A-Za-z0-9_-]+$`.

For more detail on how to use Tags when [creating transfers](/export/api-reference/transfers/create-transfer) and [filtering transfers](/export/api-reference/transfers/list-transfers) on tags, refer to our API Reference.

## Staging buckets

Some sources and destinations supported by Prequel may require staging buckets to efficiently transfer data. Where possible, Prequel will use built in staging resourced provided by the database or data warehouse, but in cases where it does not exist, it may need to be provided. The source/destination documentation will provide instructions for configuring staging buckets where needed.
