source_query with access to some of the important Prequel variables that will be templated in at runtime.
For example, in the Prequel configuration, you could write:
Source query example: deduplication
source_query (to handle column selecting and renaming), so the columns section of the config will not change. This change simply allows you to be more expressive around how Prequel reads data, and provides the option to specifically leverage some of Prequel’s parameters (like the epoch timestamp) in your query, enabling efficient partition pruning, etc.
Source query configuration is parsed as JSON5, so comments (
// and /* */) and line continuation characters (\) are supported. Use these to document complex queries inline.source_table is required in the configuration when using the custom query feature. source_table is a schema-qualified reference to the source table (e.g., source_schema.table_name).Available Prequel variables
| Available Variable | Data Type | Description |
|---|---|---|
{{.LastModifiedAtEpoch}} | integer | The greatest (most recent) epoch Prequel has transferred so far. Sample value: 1681841760 |
{{.LastModifiedAtEpochUpperBound}} | integer | The upper bound epoch (exclusive) for the current transfer. Set when the transfer specifies an end window (for example, a transfer request with end_transfer_window_at, or each window of a windowed transfer); |
{{.IdInProviderSystem}} | string | Tenant ID (available in “multi-tenant table” mode) |
{{.Schema}} | string | Schema name (available in “schema-tenanted database” mode) |
{{.PrimaryKeyColumnName}} | string | The name of the primary key column in the source table. Sample value: id |
{{.LastModifiedColumnName}} | string | The name of the last-modified timestamp column in the source table. Sample value: updated_at |
{{.OrganizationColumnName}} | string | The name of the tenant/organization column in the source table (available in “multi-tenant table” mode). Sample value: tenant_id |
Use cases
Skipping the lookback window for batched timestamps
By default, Prequel queries with a lookback window before the last transferred epoch to catch late-arriving rows. When thelast_modified column is batched (many rows share the same timestamp), this lookback can unintentionally pull a large set of already-transferred rows on each transfer.
To eliminate the lookback, filter directly on {{.LastModifiedAtEpoch}} in your source query:
Source query example: skip lookback window
Using source queries with parameterized views
Source queries can also be used to support parameterized views in sources that support the functionality (e.g., ClickHouse).Source query example: parameterized view
Distributing a shared dataset to multiple recipients
If your source table has no tenant column, you can distribute the same data to multiple recipients by combining theorganization_column field with a source query.
Set organization_column to a column name that does not exist in the source table (e.g., org_id), then define a source query that injects each recipient’s ID as that column:
Source query example: inject recipient ID column
id_in_provider_system for {{.IdInProviderSystem}}, creating a virtual org_id column used to route data. org_id does not need to be listed under columns - recipients will not see it in their tables.
Pre-aggregating metrics
If your source table contains raw event rows, you can pre-aggregate them in the source query so recipients receive summary data directly, without needing to aggregate on their end.Source query example: pre-aggregated metrics
Limitations
- DDL statements (
CREATE,DROP,ALTER, etc.) are not supported. - The query must return a valid result set that Prequel can query on top of and that matches all
columnsspecified in the model.