Source queries

Using your own source queries

Prequel has the optional ability to write your own source_query with access to some of the important Prequel variables that will be templated in at runtime.

For example, in the Prequel configuration, you could write:

{
  ...,
  "source_query": "SELECT * FROM (
  	SELECT *,  ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) AS rank_order
		FROM <schema>.<table>
			WHERE tenant_id = {{.IdInProviderSystem}} AND updated_at >= from_unixtime({{.LastModifiedAtEpoch}})
		)
		WHERE rank_order = 1",
	...
}

Prequel will still run a query on top of the source_query (to handle column selecting and renaming), so the columns section of the config will not change. This change simply allows you to be more expressive around how Prequel reads data, and provides the option to specifically leverage some of Prequel's parameters (like the epoch timestamp) in your query, enabling efficient partition pruning, etc.

📘

Source query configuration is parsed as JSON5, so comments (// and /* */) and line continuation characters (\) are supported. Use these to document complex queries inline.

📘

Note: source_table is required in the configuration when using the custom query feature. source_table is a schema-qualified reference to the source table (e.g., source_schema.table_name).

Use case: Skipping the lookback window for batched timestamps

By default, Prequel queries with a lookback window before the last transferred epoch to catch late-arriving rows. When the last_modified column is batched (many rows share the same timestamp), this lookback can unintentionally pull a large set of already-transferred rows on each transfer.

To eliminate the lookback, filter directly on {{.LastModifiedAtEpoch}} in your source query:

{
  ...,
  "source_query": "SELECT * FROM <schema>.<table>
    WHERE tenant_id = {{.IdInProviderSystem}}
      AND {{.LastModifiedColumnName}} >= from_unixtime({{.LastModifiedAtEpoch}})",
  ...
}

Use case: Using source queries with parameterized views

Source queries can also be used to support parameterized views in sources that support the functionality (e.g., ClickHouse).

{
  ...,
  "source_query": "SELECT * FROM some_parameterized_view(tenant_id = {{.IdInProviderSystem}})",
	...
}

Use case: Distributing a shared dataset to multiple recipients

If your source table has no tenant column, you can distribute the same data to multiple recipients by combining the organization_column field with a source query.

Set organization_column to a column name that does not exist in the source table (e.g., org_id), then define a source query that injects each recipient's ID as that column:

{
  ...,
  "organization_column": "org_id",
  "source_query": "SELECT *, {{.IdInProviderSystem}} AS org_id FROM <schema>.<table>",
  ...
}

At runtime, Prequel substitutes each recipient's id_in_provider_system for {{.IdInProviderSystem}}, creating a virtual org_id column used to route data. org_id does not need to be listed under columns - recipients will not see it in their tables.

Use case: Pre-aggregating metrics

If your source table contains raw event rows, you can pre-aggregate them in the source query so recipients receive summary data directly, without needing to aggregate on their end.

{
  ...,
  "source_query": "SELECT
    {{.IdInProviderSystem}} AS tenant_id,
    DATE_TRUNC('day', event_time) AS event_date,
    event_type,
    COUNT(*) AS event_count,
    SUM(value) AS total_value
    FROM events
    WHERE tenant_id = {{.IdInProviderSystem}}
    GROUP BY DATE_TRUNC('day', event_time), event_type",
  ...
}

In this case, none of the output columns exist in the source table. All columns returned by the source query must be listed in the model config's columns, otherwise they will not be selected during transfer.

Limitations

  • SQL Server sources do not support source queries.
  • DDL statements (CREATE, DROP, ALTER, etc.) are not supported.
  • The query must return a valid result set that Prequel can query on top of and that matches all columns specified in the model.

Available Prequel variables

Available VariableData TypeDescription
{{.LastModifiedAtEpoch}}integerThe greatest (most recent) epoch Prequel has transferred so far. Sample value: 1681841760
{{.IdInProviderSystem}}stringTenant ID (available in "multi-tenant table" mode)
{{.Schema}}stringSchema name (available in "schema-tenanted database" mode)
{{.PrimaryKeyColumnName}}stringThe name of the primary key column in the source table. Sample value: id
{{.LastModifiedColumnName}}stringThe name of the last-modified timestamp column in the source table. Sample value: updated_at
{{.OrganizationColumnName}}stringThe name of the tenant/organization column in the source table (available in "multi-tenant table" mode). Sample value: tenant_id