Change detection

Understanding Prequel's change detection logic

Change detection logic

At a high level, Prequel relies on a last_modified_at type column to detect changes between transfers. On the first transfer to any destination (or on a full_refresh), all historical data is transferred, and the greatest last_modified_at value is recorded. On any subsequent transfer, the data is filtered such that only data with greater than any of the previous last_modified_at values is transferred. This allows Prequel to predictably transfer batches of updated data.

-- simplified incremental transfer query
SELECT * FROM some_source_table
WHERE some_last_modified_at_column >= to_timestamp({{.LastModifiedAtEpoch}})
AND some_organization_id_column = '{{.IdInProviderSystem}}';

Timestamp precision

When the greatest last_modified_at value is recorded, it is stored as an epoch, or "unix timestamp". This value has second precision.

For example, if a batch of data is transferred where the greatest last_modified_at timestamp value was 2025-01-01 1:15:30 AM, the equivalent epoch integer will be stored: 1735694130.

Eventual consistency

In many cases, the source data platform may have eventual consistency concerns. This is especially true in cases where the last_modified_at timestamp is generated by an external system and cannot be guaranteed to be inserted as monotonically increasing values. For this reason, Prequel adds a "lookback window" to each incremental transfer. This "lookback" window differs by source vendor:

Source VendorLookback Window
Athena5 minutes
PostgreSQL / Aurora PostgreSQL30 seconds
MySQL / Aurora MySQL5 seconds
All other vendors5 minutes (default)
SELECT * FROM some_source_table
-- lookback window varies by vendor (e.g., 5 min for Athena)
WHERE some_last_modified_at_column >= to_timestamp({{.LastModifiedAtEpoch}} - INTERVAL 5 MIN)
AND some_organization_id_column = '{{.IdInProviderSystem}}';

Duplication behavior in Prequel

How Prequel handles duplicate data depends on your destination type:

Database and data warehouse destinations

No duplicates in your tables. Prequel uses upsert operations to ensure that lookback windows do not create duplicate rows in database destinations like Snowflake, BigQuery, PostgreSQL, etc.

Object storage destinations

Duplicates may occur. Object storage destinations (S3, GCS, Azure Blob Storage) are append-only by design, unless using Delta Lake. Lookback windows may cause the same data to be written multiple times and it is the expectation that these are deduplicated by the downstream data pipeline after data is written to the destination.

Primary key requirements

For all destinations, your source data must have unique primary key values within each transfer batch. Duplicate primary keys in your source data will cause transfer errors.

Interaction with a custom source_query

If you choose to use a source_query instead of the default table query, you may wonder how your source query interacts with the change detection queries.

In cases where a source_query is used, Prequel applies the same predicate filtering outside of the custom source_query.

WITH some_source_query AS (...)
SELECT * FROM some_source_query
WHERE some_last_modified_at_column >= to_timestamp({{.LastModifiedAtEpoch}} - INTERVAL 5 MIN)
AND some_organization_id_column = '{{.IdInProviderSystem}}';
📘

Change detection over a source_query

If your source_query preemptively filters data using a last_modified_at column, you may avoid the safety of the lookback window. In some cases, this may be intentional, but this tradeoff should be considered as you think about your upstream data pipeline.

🚧

Timestamp precision in a source_query resulting in missing rows

Because Prequel only stores the LastModifiedAtEpoch with second precision, comparing that value with a timestamp of higher precision could result in skipped rows.

For example, imagine a scenario where:

  • Prequel observes an last_modified_at time of 2025-01-01 00:00:01.600.
  • Prequel records this as an Epoch 1735689601 (corresponding to 2025-01-01 00:00:02).
  • A custom source_query is used, which does not account for rounding/second precision:
    • SELECT * FROM some_source_query WHERE some_last_modified_at_column >= to_timestamp({{.LastModifiedAtEpoch}}

Since the custom source_query does not also round the some_last_modified_at_column to second level precision, any rows of data with values timestamps between 00:00:01.600 and 00:00:02.00 may be skipped.

This can easily be solved by either rounding the some_last_modified_at column, or implementing your own lookback window.

Rows transferred and delta rows

In the Prequel UI (Transfer details) and API responses, you’ll often see two row counts:

  • Rows (rows_transferred): How many rows were selected from the source for this model in that transfer.
  • Delta rows (delta_rows_transferred): An estimate of how many rows in that transfer have a last_modified_at value newer than the most recent previously completed transfer for that model (the current high watermark).

rows_transferred: how it’s calculated

rows_transferred is an extraction metric: it counts the rows returned by Prequel’s source query.

  • The lower bound is inclusive.
  • The upper bound is exclusive when an end window is specified.
  • A lookback window is applied by querying slightly earlier than the lower bound; rows from this overlap are included in rows_transferred.
-- simplified rows_transferred query (count of extracted rows)
SELECT COUNT(*)
FROM some_source_table
WHERE some_last_modified_at_column >= to_timestamp({{.LastModifiedAtEpoch}} - INTERVAL 5 MIN)
  AND some_last_modified_at_column < to_timestamp({{.EndTransferWindowAtEpoch}})
  AND some_organization_id_column = '{{.IdInProviderSystem}}';
📘

Notes on bounds

The exact lookback duration depends on the source vendor (see the table above). If end_transfer_window_at is not set, the upper-bound predicate may be omitted.

delta_rows_transferred: how it’s calculated

delta_rows_transferred is a calculated estimate derived from the extracted rows. It counts rows whose last_modified_at is strictly newer than the current high watermark.

  • Included: rows where last_modified_at > high watermark (and within the transfer’s upper bound, if one is specified).
  • Excluded: the lookback overlap (to avoid counting that overlap in delta rows).
SELECT COUNT(*)
FROM internal_data_batch
WHERE CAST({{.LastModifiedAtColumnName}} AS TIMESTAMP WITH TIME ZONE) > CAST(to_timestamp({{.HighwaterMarkEpoch}}) AS TIMESTAMP WITH TIME ZONE);
  • Why it can be 0 for historical transfers: If you run a transfer over an older time range, delta_rows_transferred can be 0 even when rows_transferred is large if none of the rows in that window have a last_modified_at value newer than the most recent completed transfer’s high watermark.