DocumentationAPI Reference
Documentation

Manifest files for object storage

Object storage destinations (s3, gcs, abs, and s3_compat) receive manifest files alongside the data files. This enables any downstream pipeline processing this data to know when a transfer has completed for a given model.

A manifest file is written for each model as part of every transfer. Manifest files live in their own _manifests directory within the object storage destination, and each manifest file is named manifest_{transfer_id}.json as shown below.

some_bucket
|-- orders
|   |-- dt=2024-07-01
|       |-- file1.parquet
|           file2.parquet
|
|-- transactions
|   |-- dt=2024-07-01
|       |-- file1.parquet
|
|-- _manifests
|   |-- orders
|       |-- dt=2024-07-01
|           |-- manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json
|   |-- transactions
|       |-- dt=2024-07-01
|           |-- manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json

These files are enabled by default on new object storage destinations.

File format

Every manifest file follows the same structure. Here is a list of keys you can expect in every file, along with a sample complete file.

Object KeyValue
versionManifest format version.
transfer_idUnique ID of the transfer job.
start_timeStart time of the transfer job in UTC time.
end_timeEnd time of the transfer job in UTC time.
model_idUnique ID (uuid) of the data model.
model_nameName of the data model.
bucket_nameName of the object storage bucket.
bucket_prefixPrefix of all objects created in this transfer.
manifest_file_keyKey of the manifest file (path and file name).
file_formatFormat of the landed data (eg PARQUET).
transfer_typeType of data transfer. Either FULL_REFRESH or INCREMENTAL.
signatureSHA-256 signature of the value stored in the files key.
signature_public_keyPublic key which can be used to read the signature.
filesArray of file objects. Each object contains: a sha256_checksum, the etag of the file, and the data_file_key (the full path and name the file was written to).
{
   "version": "2024-06-01",
   "transfer_id": "aeb6efc1-73ec-405d-ae5e-d28b349b364c",
   "start_time": "2024-06-01T07:28:34.028Z",
   "end_time": "2024-06-01T07:33:43.897Z",
   "model_id": "ee9542e3-6469-4e09-bdcc-abb50ca5643a",
   "model_name": "transactions",
   "bucket_name": "vendor-data",
   "bucket_prefix": "user-supplied-schema/transactions/2024-06-01",
   "manifest_file_key": "user-supplied-schema/transactions/2024-06-01/manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json",
   "file_format": "PARQUET",
   "transfer_type": "FULL_REFRESH",
   "signature": "f4d2e40ab3c0f5b8e3b88e022db4f7c54fb9f82c77ffa2e444c479b57843f57bea32812a73b9c8a786d3b908c434f9374e0498b9a2f23c90e2f578b9444382b0f",
   "signature_public_key": "-----BEGIN RSA PUBLIC KEY-----\nMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDmf2CGFxZU/Dx911t4K8l/G5zM\njGUvhP01k2YTLtBRXEdXLGZnmzuJTOsqyPOvj3+HU/iNUQ/mXIJu7wKTrA/glZ1i\n0Zcc18Ek0jyne03ikBDIdyeYZTGi37/UnNVLwkr2FxhHUHBgiS5msFjxjquC941D\n5Xluak1U1p6/ZFV0AwIDAQAB\n-----END RSA PUBLIC KEY-----",
   "files": [
      {
         "sha256_checksum": "sQMSpEILNgoQmarvDFonGQ==",
         "etag": "af83d6f217c19b8b0fff8023d8ca4716-1",
         "data_file_key": "user-supplied-schema/transactions/2024-06-01/20240601150301.parquet"
      },
      {
         "sha256_checksum": "9c78d2e727b9f0b56a85b38dff88763c==",
         "etag": "9f84f7aacc09e05-1",
         "data_file_key": "user-supplied-schema/transactions/2024-06-01/20240601150405.parquet"
      },
      {
         "sha256_checksum": "1a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p==",
         "etag": "3d4fgsd7834f734b-1",
         "data_file_key": "user-supplied-schema/transactions/2024-06-01/20240601150645.parquet"
      },
   ]
}