Manifest files for object storage

Object storage destinations (s3, gcs, abs, and s3_compat) receive manifest files alongside the data files. This enables any downstream pipeline processing this data to know when a transfer has completed for a given model. A manifest file is written for each model as part of every transfer. Manifest files live in their own _manifests directory within the object storage destination, and each manifest file is named manifest_{transfer_id}.json as shown below.

some_bucket

orders

dt=2024-07-01

file1.parquet

file2.parquet

transactions

dt=2024-07-01

file1.parquet

_manifests

orders

dt=2024-07-01

manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json

transactions

dt=2024-07-01

manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json

These files are enabled by default on new object storage destinations.

File format

Every manifest file follows the same structure. Here is a list of keys you can expect in every file, along with a sample complete file.

Object Key	Value
`version`	Manifest format version.
`transfer_id`	Unique ID of the transfer job.
`start_time`	Start time of the transfer job in UTC time.
`end_time`	End time of the transfer job in UTC time.
`model_id`	Unique ID (uuid) of the data model.
`model_name`	Name of the data model.
`bucket_name`	Name of the object storage bucket.
`bucket_prefix`	Prefix of all objects created in this transfer.
`manifest_file_key`	Key of the manifest file (path and file name).
`file_format`	Format of the landed data (eg `PARQUET`).
`transfer_type`	Type of data transfer. Either `FULL_REFRESH` or `INCREMENTAL`.
`signature`	SHA-256 signature of the value stored in the `files` key.
`signature_public_key`	Public key which can be used to read the signature.
`files`	Array of file objects. Each object contains: a `sha256_checksum`, the `etag` of the file, and the `data_file_key` (the full path and name the file was written to).

Sample manifest file

{
   "version": "2024-06-01",
   "transfer_id": "aeb6efc1-73ec-405d-ae5e-d28b349b364c",
   "start_time": "2024-06-01T07:28:34.028Z",
   "end_time": "2024-06-01T07:33:43.897Z",
   "model_id": "ee9542e3-6469-4e09-bdcc-abb50ca5643a",
   "model_name": "transactions",
   "bucket_name": "vendor-data",
   "bucket_prefix": "user-supplied-schema/transactions/2024-06-01",
   "manifest_file_key": "user-supplied-schema/transactions/2024-06-01/manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json",
   "file_format": "PARQUET",
   "transfer_type": "FULL_REFRESH",
   "signature": "f4d2e40ab3c0f5b8e3b88e022db4f7c54fb9f82c77ffa2e444c479b57843f57bea32812a73b9c8a786d3b908c434f9374e0498b9a2f23c90e2f578b9444382b0f",
   "signature_public_key": "-----BEGIN RSA PUBLIC KEY-----\nMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDmf2CGFxZU/Dx911t4K8l/G5zM\njGUvhP01k2YTLtBRXEdXLGZnmzuJTOsqyPOvj3+HU/iNUQ/mXIJu7wKTrA/glZ1i\n0Zcc18Ek0jyne03ikBDIdyeYZTGi37/UnNVLwkr2FxhHUHBgiS5msFjxjquC941D\n5Xluak1U1p6/ZFV0AwIDAQAB\n-----END RSA PUBLIC KEY-----",
   "files": [
      {
         "sha256_checksum": "sQMSpEILNgoQmarvDFonGQ==",
         "etag": "af83d6f217c19b8b0fff8023d8ca4716-1",
         "data_file_key": "user-supplied-schema/transactions/2024-06-01/20240601150301.parquet"
      },
      {
         "sha256_checksum": "9c78d2e727b9f0b56a85b38dff88763c==",
         "etag": "9f84f7aacc09e05-1",
         "data_file_key": "user-supplied-schema/transactions/2024-06-01/20240601150405.parquet"
      },
      {
         "sha256_checksum": "1a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p==",
         "etag": "3d4fgsd7834f734b-1",
         "data_file_key": "user-supplied-schema/transactions/2024-06-01/20240601150645.parquet"
      },
   ]
}

Getting started

Core concepts

Features

Deploying Prequel

Logging & Monitoring

Integrations

Developer SDKs

Sources

Destinations

Security & compliance

Manifest files for object storage

File format

​File format

File format