Change detection
Understanding Prequel's change detection logic
Change detection logic
At a high level, Prequel relies on a last_modified_at
type column to detect changes between transfers. On the first transfer to any destination (or on a full_refresh
), all historical data is transferred, and the greatest last_modified_at
value is recorded. On any subsequent transfer, the data is filtered such that only data with greater than any of the previous last_modified_at
values is transferred. This allows Prequel to predictably transfer batches of updated data.
-- simplified incremental transfer query
SELECT * FROM some_source_table
WHERE some_last_modified_at_column >= to_timestamp({{.LastModifiedAtEpoch}})
AND some_organization_id_column = '{{.IdInProviderSystem}}';
Timestamp precision
When the greatest last_modified_at
value is recorded, it is stored as an epoch, or "unix timestamp". This value has second precision.
For example, if a batch of data is transferred where the greatest last_modified_at
timestamp value was 2025-01-01 1:15:30 AM
, the equivalent epoch integer will be stored: 1735694130
.
Eventual consistency
In many cases, the source data platform may have eventual consistency concerns. This is especially true in cases where the last_modified_at
timestamp is generated by an external system and cannot be guaranteed to be inserted as monotonically increasing values. For this reason, Prequel adds a "lookback window" to each incremental transfer. This "lookback" window differs by source vendor:
Source Vendor | Lookback Window |
---|---|
Athena | 5 minutes |
PostgreSQL / Aurora PostgreSQL | 30 seconds |
MySQL / Aurora MySQL | 5 seconds |
All other vendors | 5 minutes (default) |
SELECT * FROM some_source_table
-- lookback window varies by vendor (e.g., 5 min for Athena)
WHERE some_last_modified_at_column >= to_timestamp({{.LastModifiedAtEpoch}} - INTERVAL 5 MIN)
AND some_organization_id_column = '{{.IdInProviderSystem}}';
Duplication behavior in Prequel
How Prequel handles duplicate data depends on your destination type:
Database and data warehouse destinations
No duplicates in your tables. Prequel uses upsert operations to ensure that lookback windows do not create duplicate rows in database destinations like Snowflake, BigQuery, PostgreSQL, etc.
Object storage destinations
Duplicates may occur. Object storage destinations (S3, GCS, Azure Blob Storage) are append-only by design, unless using Delta Lake. Lookback windows may cause the same data to be written multiple times and it is the expectation that these are deduplicated by the downstream data pipeline after data is written to the destination.
Primary key requirements
For all destinations, your source data must have unique primary key values within each transfer batch. Duplicate primary keys in your source data will cause transfer errors.
Interaction with a custom source_query
source_query
If you choose to use a source_query
instead of the default table query, you may wonder how your source query interacts with the change detection queries.
In cases where a source_query
is used, Prequel applies the same predicate filtering outside of the custom source_query
.
WITH some_source_query AS (...)
SELECT * FROM some_source_query
WHERE some_last_modified_at_column >= to_timestamp({{.LastModifiedAtEpoch}} - INTERVAL 5 MIN)
AND some_organization_id_column = '{{.IdInProviderSystem}}';
Change detection over a
source_query
If your
source_query
preemptively filters data using alast_modified_at
column, you may avoid the safety of the lookback window. In some cases, this may be intentional, but this tradeoff should be considered as you think about your upstream data pipeline.
Timestamp precision in a
source_query
resulting in missing rowsBecause Prequel only stores the
LastModifiedAtEpoch
with second precision, comparing that value with a timestamp of higher precision could result in skipped rows.For example, imagine a scenario where:
- Prequel observes an
last_modified_at
time of2025-01-01 00:00:01.600
.- Prequel records this as an Epoch
1735689601
(corresponding to2025-01-01 00:00:02
).- A custom
source_query
is used, which does not account for rounding/second precision:
SELECT * FROM some_source_query WHERE some_last_modified_at_column >= to_timestamp({{.LastModifiedAtEpoch}}
Since the custom
source_query
does not also round thesome_last_modified_at_column
to second level precision, any rows of data with values timestamps between00:00:01.600
and00:00:02.00
may be skipped.This can easily be solved by either rounding the
some_last_modified_at column
, or implementing your own lookback window.
Calculating deltas
deltas
In many Prequel reports, a count of "delta" rows transferred is calculated. This does not use a lookback window, and attempts to report the most accurate count of data that has changed since the last transfer (without the risk of sacrificing data integrity).
SELECT COUNT(*)
FROM internal_data_batch
WHERE CAST({{.LastUpdatedAtColumnName}} AS TIMESTAMP WITH TIME ZONE) > CAST(to_timestamp({{.HighwaterMarkEpoch}}) AS TIMESTAMP WITH TIME ZONE);
Where delta counts are reported
Delta row counts are not displayed in the Prequel dashboard UI. Instead, they are available through the API in the transfer endpoints and the destination endpoints. The dashboard shows total row counts transferred, inclusive of the lookback window.
Updated 14 days ago