Source queries
Using your own source queries
Prequel has the optional ability to write your own source_query with access to some of the important Prequel variables that will be templated in at runtime.
For example, in the Prequel configuration, you could write:
{
...,
"source_query": "SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) AS rank_order
FROM <schema>.<table>
WHERE tenant_id = {{.IdInProviderSystem}} AND updated_at >= from_unixtime({{.LastModifiedAtEpoch}})
)
WHERE rank_order = 1",
...
}Prequel will still run a query on top of the source_query (to handle column selecting and renaming), so the columns section of the config will not change. This change simply allows you to be more expressive around how Prequel reads data, and provides the option to specifically leverage some of Prequel's parameters (like the epoch timestamp) in your query, enabling efficient partition pruning, etc.
Source query configuration is parsed as JSON5, so comments (//and/* */) and line continuation characters (\) are supported. Use these to document complex queries inline.
Note:source_tableis required in the configuration when using the custom query feature.source_tableis a schema-qualified reference to the source table (e.g., source_schema.table_name).
Use case: Skipping the lookback window for batched timestamps
By default, Prequel queries with a lookback window before the last transferred epoch to catch late-arriving rows. When the last_modified column is batched (many rows share the same timestamp), this lookback can unintentionally pull a large set of already-transferred rows on each transfer.
To eliminate the lookback, filter directly on {{.LastModifiedAtEpoch}} in your source query:
{
...,
"source_query": "SELECT * FROM <schema>.<table>
WHERE tenant_id = {{.IdInProviderSystem}}
AND {{.LastModifiedColumnName}} >= from_unixtime({{.LastModifiedAtEpoch}})",
...
}Use case: Using source queries with parameterized views
Source queries can also be used to support parameterized views in sources that support the functionality (e.g., ClickHouse).
{
...,
"source_query": "SELECT * FROM some_parameterized_view(tenant_id = {{.IdInProviderSystem}})",
...
}Use case: Distributing a shared dataset to multiple recipients
If your source table has no tenant column, you can distribute the same data to multiple recipients by combining the organization_column field with a source query.
Set organization_column to a column name that does not exist in the source table (e.g., org_id), then define a source query that injects each recipient's ID as that column:
{
...,
"organization_column": "org_id",
"source_query": "SELECT *, {{.IdInProviderSystem}} AS org_id FROM <schema>.<table>",
...
}At runtime, Prequel substitutes each recipient's id_in_provider_system for {{.IdInProviderSystem}}, creating a virtual org_id column used to route data. org_id does not need to be listed under columns - recipients will not see it in their tables.
Use case: Pre-aggregating metrics
If your source table contains raw event rows, you can pre-aggregate them in the source query so recipients receive summary data directly, without needing to aggregate on their end.
{
...,
"source_query": "SELECT
{{.IdInProviderSystem}} AS tenant_id,
DATE_TRUNC('day', event_time) AS event_date,
event_type,
COUNT(*) AS event_count,
SUM(value) AS total_value
FROM events
WHERE tenant_id = {{.IdInProviderSystem}}
GROUP BY DATE_TRUNC('day', event_time), event_type",
...
}In this case, none of the output columns exist in the source table. All columns returned by the source query must be listed in the model config's columns, otherwise they will not be selected during transfer.
Limitations
- SQL Server sources do not support source queries.
- DDL statements (
CREATE,DROP,ALTER, etc.) are not supported. - The query must return a valid result set that Prequel can query on top of and that matches all
columnsspecified in the model.
Available Prequel variables
| Available Variable | Data Type | Description |
|---|---|---|
{{.LastModifiedAtEpoch}} | integer | The greatest (most recent) epoch Prequel has transferred so far. Sample value: 1681841760 |
{{.IdInProviderSystem}} | string | Tenant ID (available in "multi-tenant table" mode) |
{{.Schema}} | string | Schema name (available in "schema-tenanted database" mode) |
{{.PrimaryKeyColumnName}} | string | The name of the primary key column in the source table. Sample value: id |
{{.LastModifiedColumnName}} | string | The name of the last-modified timestamp column in the source table. Sample value: updated_at |
{{.OrganizationColumnName}} | string | The name of the tenant/organization column in the source table (available in "multi-tenant table" mode). Sample value: tenant_id |
Updated about 2 hours ago