Windowed transfers

Automatically splitting up transfers into smaller chunks

Overview

By default, each transfer will transfer all available, newly updated data. There are cases where that behavior is not optimal: perhaps the volume of data is large, and the recipient does not want to wait for all the data to be backfilled before starting to consume it.

It is possible to configure Prequel to split up any transfer into smaller windows of an arbitrary size. This configuration happens on a per-model basis. If a transfer fails during a given window, the next transfer will pick things up starting at that window again. In other words, progress from previous windows will be saved even in the case of an error.

How it works

To leverage windowed transfers, two new fields must be specified on each Prequel model:

  • max_transfer_time_window_seconds: this specifies the largest possible time increment Prequel will use when moving data for this model.

    For example, say this value is set to 1 day ("max_transfer_time_window_seconds": 86400), and a transfer is triggered that would transfer 10 days of data. This transfer will automatically be broken up into 10, one-day windows.

  • min_start_time_epoch: this specifies the time at which to start backfilling this model for any new destination.

    The easiest way to illustrate why this value is necessary is through an example. Let's assume that the max_transfer_time_window_seconds is set to 86400, aka one day. Each Prequel transfer will now try to move at most one day's worth of data.

    By default, Prequel will try to move data starting at epoch 0 (January 1 1970 at midnight UTC). Starting at that date, and moving data one day at a time, it will take over 19,000 windows to catch up to today. In other words, that might take a good long while and your source datastore might not be happy. The min_start_time_epoch value lets you specify a more reasonable starting point. What a good value is depends, among other things, on your own data. Perhaps your dataset starts January 1 2023. In this case, it will "only" take ~400 windows to catch up.

Here is an example model definition which leverages those parameters.

{
  "model_name": "logs",
  "columns": ["..."],
  
  "max_transfer_time_window_seconds": 86400,
  "min_start_time_epoch": 1650000000,
  
  "source_table": "source_schema.application_logs",
  "source_name": "Example Production Source",
  "organization_column": "organization_id"
}

πŸ“˜

Recipient-level override

It is possible to override min_start_time_epoch at the recipient level. To do so, populate the default_full_refresh_min_start_time_epoch on the desired recipient. If specified, this value will apply across all models (even ones that don't explicitly leverage windowed transfers).

FAQ

Q: Why doesn't Prequel query for the min_start_value instead of potentially running transfers for empty windows?
A: Queries for minimum values across large-scale datasets can be extremely expensive.