Windowed transfers
Automatically splitting up transfers into smaller chunks
Overview
By default, each transfer will transfer all available, newly updated data. There are cases where that behavior is not optimal: perhaps the volume of data is large, and the recipient does not want to wait for all the data to be backfilled before starting to consume it.
It is possible to configure Prequel to split up any transfer into smaller windows of an arbitrary size. This configuration happens on a per-model basis. If a transfer fails during a given window, the next transfer will pick things up starting at that window again. In other words, progress from previous windows will be saved even in the case of an error.
How it works
To leverage windowed transfers, two new fields must be specified on each Prequel model:
-
max_transfer_time_window_seconds
: this specifies the largest possible time increment Prequel will use when moving data for this model.For example, say this value is set to 1 day (
"max_transfer_time_window_seconds": 86400
), and a transfer is triggered that would transfer 10 days of data. This transfer will automatically be broken up into 10, one-day windows. -
min_start_time_epoch
: this specifies the time at which to start backfilling this model for any new destination.The easiest way to illustrate why this value is necessary is through an example. Let's assume that the
max_transfer_time_window_seconds
is set to 86400, aka one day. Each Prequel transfer will now try to move at most one day's worth of data.By default, Prequel will try to move data starting at epoch 0 (January 1 1970 at midnight UTC). Starting at that date, and moving data one day at a time, it will take over 19,000 windows to catch up to today. In other words, that might take a good long while and your source datastore might not be happy. The
min_start_time_epoch value
lets you specify a more reasonable starting point. What a good value is depends, among other things, on your own data. Perhaps your dataset starts January 1 2023. In this case, it will "only" take ~400 windows to catch up.
Here is an example model definition which leverages those parameters.
{
"model_name": "logs",
"columns": ["..."],
"max_transfer_time_window_seconds": 86400,
"min_start_time_epoch": 1650000000,
"source_table": "source_schema.application_logs",
"source_name": "Example Production Source",
"organization_column": "organization_id"
}
Recipient-level override
It is possible to override
min_start_time_epoch
at the recipient level. To do so, populate thedefault_full_refresh_min_start_time_epoch
on the desired recipient. If specified, this value will apply across all models (even ones that don't explicitly leverage windowed transfers).
FAQ
Q: Why doesn't Prequel query for the min_start_value
instead of potentially running transfers for empty windows?
A: Queries for minimum values across large-scale datasets can be extremely expensive.
Updated 9 months ago