Data models

What are data models?

In this documentation, a data model refers to the Prequel representation of the structured data to be transferred using Prequel. An individual data model usually represents a single table, though (depending on the source/destination) it may also refer to a JSON blob or other file format (e.g., Parquet, CSV).

A Prequel data model represents the data structure that is read from a source and written to a destination

Data model configuration

Each Prequel data model is configured via a JSON file with a specific format. If no model_name is provided, the name of the file (not including the extension) will be the name used for the table in the destination database. For example, for the configuration below, the destination table will appear as logs.

{
  "model_name": "logs",
  "columns": [
    {
      "name_in_source": "id",
      "name_in_destination": "id",
      "data_type": "text",
      "is_primary_key": true
    },
    {
      "name_in_source": "log",
      "name_in_destination": "event_log",
      "description":"A descriptive text entry of the event that occured.",
      "data_type": "text"
    },
    {
      "name_in_source": "updated_at",
      "name_in_destination": "updated_at",
      "data_type": "timestamp",
      "is_last_modified": true
    }
  ],
  "partition_by": ["date(updated_at)"],
  "source_table": "source_schema.application_logs",
  "source_name": "Example Production Source",
  "organization_column": "organization_id"
}

📘
Syncing config files from GitHub
You can leverage the Prequel GitHub Action to automatically sync any config file changes from GitHub to Prequel.

Configuration file keys

Each configuration file has 6 top level config keys: (3 required) columns, source_table, and organization_column, and (3 optional) model_name, description, and source_name.

Config file keys

Config Key	Required	Description
`model_name`		Name of table to be loaded into the destination. If left blank, the file name (not including file extension) will be used.
`description`		Table description. Will be applied to destination tables if provided.
`columns`	✅	A list of all destination columns (for this destination table). This is used to `CREATE` the table in the destination database, so it is important to use accurate types that correspond to the source data. The order of this list dictates the order in which columns appear in the destination database. These columns are also used to create the `READ` and `WRITE` query, so it is important to use the correct `name_in_source` and a descriptive `name_in_destination` for end customers. They can, but do not need to be, the same name. `is_primary_key` and `is_last_modified` are two important column flags that indicate which columns to use for detecting changes and propagating changes to the correct row in the destination. One of each must be present on one of the column objects.
`source_table`	✅	Specifies the name of the source table to be queried.
`source_name`		Specifies the source from which this model should read data. This field is only required if multiple sources exist in the account. If left blank and only one source exists, models will read from that source by default.
`organization_column`	✅	Specifies the column that will be used for filtering data by organization. Because this column is not necessarily a column that you want to transfer to users, it is not required within the `columns` list (unlike `is_primary_key` and `is_last_modified`). This column must be a string or varchar (non-numeric) data type.
`partition_by`		A string list representing the composite partition key in the destination. Values correspond to `name_in_destination` entries from the columns array and support helper functions `day()`, `month()`, `year()`, and `date()` for timestamp columns. No values means no partitioning; one value means one partition; N values mean N levels of partitioning.

Column object keys

Column object key	Required	Description
`name_in_source`	✅
`name_in_destination`	✅
`data_type`	✅
`is_primary_key`	(On exactly one column)
`is_last_modified`	(On exactly one column)
`description`

🚧
Using Acceptable Types
It is important to use the correct data_type for each column to ensure compatibility with all destinations. Refer to the Data Types documentation for the complete list of supported types.

By default, every destination you add to Prequel will be sent all the models defined in the prequel/models directory. Products allow you to only send a subset of those models to destinations.

Reserved column names

Some reserved keywords cannot be used as column names due to reserved statuses in common destination systems. There are 334:

1-67	68-134	135-201	202-268	269-334
`ACCESSIBLE`	`DEFERRABLE`	`INOUT`	`NULL`	`SMALLINT`
`ACCOUNT`	`DEFINE`	`INSENSITIVE`	`NULLS`	`SOME`
`ADD`	`DELAYED`	`INSERT`	`NUMERIC`	`SPATIAL`
`ALL`	`DELETE`	`INT`	`OF`	`SPECIFIC`
`ALTER`	`DENSE_RANK`	`INT1`	`OFFSET`	`SQL`
`ANALYSE`	`DESC`	`INT2`	`ON`	`SQL_BIG_RESULT`
`ANALYZE`	`DESCRIBE`	`INT3`	`ONLY`	`SQL_CALC_FOUND_ROWS`
`AND`	`DETERMINISTIC`	`INT4`	`OPTIMIZE`	`SQL_SMALL_RESULT`
`ANTI`	`DISTINCT`	`INT8`	`OPTIMIZER_COSTS`	`SQLEXCEPTION`
`ANY`	`DISTINCTROW`	`INTEGER`	`OPTION`	`SQLSTATE`
`ARRAY`	`DIV`	`INTERSECT`	`OPTIONALLY`	`SQLWARNING`
`AS`	`DO`	`INTERVAL`	`OR`	`SSL`
`ASC`	`DOUBLE`	`INTO`	`ORDER`	`START`
`ASENSITIVE`	`DROP`	`IO_AFTER_GTIDS`	`ORGANIZATION`	`STARTING`
`ASSERT_ROWS_MODIFIED`	`DUAL`	`IO_BEFORE_GTIDS`	`OUT`	`STORED`
`ASYMMETRIC`	`EACH`	`IS`	`OUTER`	`STRAIGHT_JOIN`
`AT`	`ELSE`	`ISNULL`	`OUTFILE`	`STRUCT`
`AUTHORIZATION`	`ELSEIF`	`ISSUE`	`OVER`	`SYMMETRIC`
`BEFORE`	`EMPTY`	`ITERATE`	`OVERLAPS`	`SYSTEM`
`BETWEEN`	`ENCLOSED`	`JOIN`	`PARTITION`	`TABLE`
`BIGINT`	`END`	`JSON_TABLE`	`PERCENT_RANK`	`TABLESAMPLE`
`BINARY`	`ENUM`	`KEY`	`PLACING`	`TERMINATED`
`BLOB`	`ESCAPE`	`KEYS`	`PRECEDING`	`THEN`
`BOTH`	`ESCAPED`	`KILL`	`PRECISION`	`TINYBLOB`
`BY`	`EXCEPT`	`LAG`	`PRIMARY`	`TINYINT`
`CALL`	`EXCLUDE`	`LAST_VALUE`	`PROCEDURE`	`TINYTEXT`
`CASCADE`	`EXISTS`	`LATERAL`	`PROTO`	`TO`
`CASE`	`EXIT`	`LEAD`	`PURGE`	`TRAILING`
`CAST`	`EXPLAIN`	`LEADING`	`QUALIFY`	`TREAT`
`CHANGE`	`EXTRACT`	`LEAVE`	`RANGE`	`TRIGGER`
`CHAR`	`FALSE`	`LEFT`	`RANK`	`TRUE`
`CHARACTER`	`FETCH`	`LIKE`	`READ`	`UNBOUNDED`
`CHECK`	`FIRST_VALUE`	`LIMIT`	`READ_WRITE`	`UNDO`
`COLLATE`	`FLOAT`	`LINEAR`	`READS`	`UNION`
`COLLATION`	`FLOAT4`	`LINES`	`REAL`	`UNIQUE`
`COLUMN`	`FLOAT8`	`LOAD`	`RECURSIVE`	`UNLOCK`
`CONCURRENTLY`	`FOLLOWING`	`LOCALTIME`	`REFERENCES`	`UNNEST`
`CONDITION`	`FOR`	`LOCALTIMESTAMP`	`REGEXP`	`UNSIGNED`
`CONNECT`	`FORCE`	`LOCK`	`RELEASE`	`UPDATE`
`CONNECTION`	`FOREIGN`	`LONG`	`RENAME`	`USAGE`
`CONSTRAINT`	`FREEZE`	`LONGBLOB`	`REPEAT`	`USE`
`CONTAINS`	`FROM`	`LONGTEXT`	`REPLACE`	`USER`
`CONTINUE`	`FULL`	`LOOKUP`	`REQUIRE`	`USING`
`CONVERT`	`FULLTEXT`	`LOOP`	`RESIGNAL`	`UTC_DATE`
`CREATE`	`FUNCTION`	`LOW_PRIORITY`	`RESTRICT`	`UTC_TIME`
`CROSS`	`GENERATED`	`MASTER_BIND`	`RETURN`	`UTC_TIMESTAMP`
`CUBE`	`GET`	`MASTER_SSL_VERIFY_SERVER_CERT`	`RETURNING`	`VALUES`
`CUME_DIST`	`GRANT`	`MATCH`	`REVOKE`	`VARBINARY`
`CURRENT`	`GROUP`	`MAXVALUE`	`RIGHT`	`VARCHAR`
`CURRENT_CATALOG`	`GROUPING`	`MEDIUMBLOB`	`RLIKE`	`VARCHARACTER`
`CURRENT_DATE`	`GROUPS`	`MEDIUMINT`	`ROLLUP`	`VARIADIC`
`CURRENT_ROLE`	`GSCLUSTER`	`MEDIUMTEXT`	`ROW`	`VARYING`
`CURRENT_SCHEMA`	`HASH`	`MERGE`	`ROW_NUMBER`	`VERBOSE`
`CURRENT_TIME`	`HAVING`	`MIDDLEINT`	`ROWS`	`VIEW`
`CURRENT_TIMESTAMP`	`HIGH_PRIORITY`	`MINUS`	`SAMPLE`	`VIRTUAL`
`CURRENT_USER`	`HOUR_MICROSECOND`	`MINUTE_MICROSECOND`	`SCHEMA`	`WHEN`
`CURSOR`	`HOUR_MINUTE`	`MINUTE_SECOND`	`SCHEMAS`	`WHENEVER`
`DATABASE`	`HOUR_SECOND`	`MOD`	`SECOND_MICROSECOND`	`WHERE`
`DATABASES`	`IF`	`MODIFIES`	`SELECT`	`WHILE`
`DAY_HOUR`	`IGNORE`	`NATURAL`	`SEMI`	`WINDOW`
`DAY_MICROSECOND`	`ILIKE`	`NEW`	`SENSITIVE`	`WITH`
`DAY_MINUTE`	`IN`	`NO`	`SEPARATOR`	`WITHIN`
`DAY_SECOND`	`INCREMENT`	`NO_WRITE_TO_BINLOG`	`SESSION_USER`	`WRITE`
`DEC`	`INDEX`	`NOT`	`SET`	`XOR`
`DECIMAL`	`INFILE`	`NOTNULL`	`SHOW`	`YEAR_MONTH`
`DECLARE`	`INITIALLY`	`NTH_VALUE`	`SIGNAL`	`ZEROFILL`
`DEFAULT`	`INNER`	`NTILE`	`SIMILAR`	``

Grouping data models into products

By default, every destination you add to Prequel will be sent all configured models. Products allow you to only send a subset of those models to destinations.

Data models

What are data models?

Data model configuration

📘
Syncing config files from GitHub

Configuration file keys

Config file keys

Column object keys

🚧
Using Acceptable Types

Reserved column names

Grouping data models into products

What are data models?

Data model configuration

📘Syncing config files from GitHub

Configuration file keys

Config file keys

Column object keys

🚧Using Acceptable Types

Reserved column names

Grouping data models into products

📘
Syncing config files from GitHub

🚧
Using Acceptable Types