s3, gcs, abs, and s3_compat) receive manifest files alongside the data files. This enables any downstream pipeline processing this data to know when a transfer has completed for a given model.
A manifest file is written for each model as part of every transfer. Manifest files live in their own _manifests directory within the object storage destination, and each manifest file is named manifest_{transfer_id}.json as shown below.
some_bucket
orders
dt=2024-07-01
file1.parquet
file2.parquet
transactions
dt=2024-07-01
file1.parquet
_manifests
orders
dt=2024-07-01
manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json
transactions
dt=2024-07-01
manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json
File format
Every manifest file follows the same structure. Here is a list of keys you can expect in every file, along with a sample complete file.| Object Key | Value |
|---|---|
version | Manifest format version. |
transfer_id | Unique ID of the transfer job. |
start_time | Start time of the transfer job in UTC time. |
end_time | End time of the transfer job in UTC time. |
model_id | Unique ID (uuid) of the data model. |
model_name | Name of the data model. |
bucket_name | Name of the object storage bucket. |
bucket_prefix | Prefix of all objects created in this transfer. |
manifest_file_key | Key of the manifest file (path and file name). |
file_format | Format of the landed data (eg PARQUET). |
transfer_type | Type of data transfer. Either FULL_REFRESH or INCREMENTAL. |
signature | SHA-256 signature of the value stored in the files key. |
signature_public_key | Public key which can be used to read the signature. |
files | Array of file objects. Each object contains: a sha256_checksum, the etag of the file, and the data_file_key (the full path and name the file was written to). |
Sample manifest file