SFTP
Configuring your SFTP server.
Prerequisites
- By default, SFTP uses keypair authentication for access. You will need a provided
public keyto configure your destination. It will look roughly like this:
ssh-key <ssh_public_key_beginning_with_AAAA> some-commentStep 1: Create a user on the SFTP server
Log in to the SFTP server and complete the steps below.
- Create group
sftpwriter:
sudo groupadd sftpwriter- Create user
sftpwriter:
sudo useradd -m -g sftpwriter sftpwriter- Switch to the
sftpwriteruser:
sudo su - sftpwriter- Create the
.sshdirectory:
mkdir ~/.ssh- Set permissions:
chmod 700 ~/.ssh- Navigate to the
.sshdirectory:
cd ~/.ssh- Create the
authorized_keysfile:
touch authorized_keys- Set permissions:
chmod 600 authorized_keys- Add the public key to the
authorized_keysfile. The key -- including the "ssh-key" and comment -- should be all on one line in the file, without linebreaks.
echo "ssh-key <ssh_public_key_beginning_with_AAAA> sftpwriter-public-key" > authorized_keysStep 2: Add your destination
Use the following details to complete the connection setup: host name, folder name, username, port and preferred delimiter character.
Write permissions at the SFTP root are requiredIn addition to write access within your configured
<folder>, this destination writes per-transfer manifest files under a_manifests/directory created at the root of the SFTP home/path. Ensure the SFTP user can create and write to_manifestsat that root (even if your data lands under a subfolder). Manifests allow downstream systems to detect when a transfer is complete. Refer to the FAQ section below to understand how these files are organized.
Optional: PGP encryptionSFTP files are encrypted in-transit, by virtue of the SFTP protocol. We offer an optional, additional layer of encryption at-rest for SFTP files using PGP encryption.
In order to enable PGP encryption, you will need to generate your own PGP public/private key pair, and provide the public key on a per-destination basis (one for each destination) when configuring each PGP-enabled destination. The public key must be provided in ASCII armored format, beginning with the header line
-----BEGIN PGP PUBLIC KEY BLOCK-----, and ending with the tail line-----END PGP PUBLIC KEY BLOCK-----. For security reasons, only RSA and ECC keys are supported. RSA keys must have a key size of 2,048 bits or more. If PGP encryption is enabled, both files will be encrypted in the PGP binary format. They will have an additional.pgpfile extension appended to their filename. For example, encrypted CSV files will have filenames likeyour_data.csv.pgp. It is not possible to enable/disable the PGP encryption settings of an existing destination. You must create a new destination from scratch with the new PGP configuration.
Permissions checklist
- SFTP user created with SSH key-based authentication.
- Provided public key added to
~/.ssh/authorized_keysfor the SFTP user. - SFTP user has write access to the configured folder.
- SFTP user can create and write to
_manifests/at the SFTP home/path root. - Firewall or network perimeter allows the service's egress IP to connect on port 22.
- (If using PGP encryption) PGP public key in ASCII armored format is ready to provide during destination configuration. RSA keys must be 2,048 bits or larger.
Frequently Asked Questions
- How will the data appear in my SFTP server?
-
The data will be loaded with the configured file format (Parquet, CSV, or JSON/JSONL) in a predictable folder structure that can be easily parsed by downstream systems.
sftpwriter_home_folder/ ├─ some_provided_folder/ │ ├─ some_table_a/ │ │ ├─ dt=2024-01-01/ │ │ │ ├─ 0_20240101181004.csv │ │ │ ├─ 1_20240101184002.csv │ │ ├─ dt=2024-01-02/ │ │ │ ├─ 0_20240102180123.csv │ │ ├─ dt=2024-01-03/ │ │ │ ├─ 0_20240103182145.csv │ ├─ some_table_b/ │ │ ├─ dt=2024-01-01/ │ │ │ ├─ 0_20240101186004.csv │ │ ├─ dt=2024-01-02/ │ │ │ ├─ 0_20240102185123.csv │ │ ├─ dt=2024-01-03/ │ │ │ ├─ 0_20240103187145.csv
-
Q: How is the SFTP connection secured?
A: Use SSH key-based authentication for a dedicated, least-privileged SFTP user. Restrict access to only the required directories (e.g., chroot), and allowlist the service's static egress IP at your network perimeter.
Q: What if I cannot provide write access at the root of my SFTP server?
A: Write privileges at the root is expected in order to write manifest files (continue reading below). Note that we recommend using a dedicated SFTP endpoint for data exports for security and data isolation best practices. If write permissions cannot be provided at the root, you can chroot your writer user to your target folder, and instead configure the SFTP connection without a folder name. Both the manifest files and data will then be written at this target folder.
Q: What file formats are supported?
A: Parquet (default/recommended), CSV, and JSON/JSONL.
Q: How are large datasets written?
A: Files are automatically split; multiple files may be written per model per transfer.
Q: How do I know when a transfer completed?
A: Each transfer writes a manifest JSON file per model under _manifests/ at the root. Files follow the pattern: _manifests/<model_name>/dt=<transfer_date>/manifest_{transfer_id}.json. Use these manifests to trigger downstream processing.
Q: Why do I sometimes see duplicates?
A: File-based destinations are append-oriented. The change-detection process uses a lookback window to prevent missed records, which can create duplicates across adjacent transfers. Downstream pipelines can deduplicate by primary key prioritizing rows in the most recent transfer window.
Q: Can I provide my own SSH/SFTP public key? Where is the private key stored?
A: We do not support providing your own public key for security reasons. The private key is securely generated and stored in our system and is never shared externally.
Q: Can I provide my own PGP public key?
A: Yes, you can optionally provide your own PGP public key for us to encrypt the SFTP files with, on a per-destination basis (one for each destination). The public key must be provided in ASCII armored format.
Q: Is this PGP public key an alternative to the SSH/SFTP public key?
A: No, the PGP key is not an alternative to the mandatory SSH/SFTP key. The PGP key is optional, while the SFTP key is mandatory. The two kinds of keys serve functionally distinct purposes. The SSH/SFTP key is used for authentication and encryption in-transit. On the other hand, the PGP key is not used for authentication, and is instead used to add another layer of encryption of files in transit, and importantly, to ensure that the files remain encrypted at rest in the SFTP destination.
Q: When exactly does the PGP encryption occur, when enabled?
A: The PGP encryption is applied within our system, after the data is pulled from the source. The data remains PGP-encrypted in-transit between our system and the SFTP server, and stays PGP-encrypted at rest in the SFTP destination.
Q: Will the PGP-encrypted files be generated in the armored (ASCII) format, or in binary?
A: Files are encrypted in the binary format only.
Q: Are manifest files also encrypted, or only the landed data?
A: If PGP encryption is enabled, both the manifest files and landed data will be encrypted.
Q: Are there any constraints on the PGP public keys?
A: RSA key sizes must be 2,048 bits or more.
Updated 2 days ago