ClickHouse
Configuring your ClickHouse destination.
Prerequisites
- If your ClickHouse security posture requires IP whitelisting, have our data syncing service's static IP available during the following steps. It will be required in Step 1.
Step 1: Allow access
Create a rule in a security group or firewall settings to whitelist:
- incoming connections to your host and port (usually
9440
) from the static IP. - outgoing connections from ports
1024
to65535
to the static IP.
Step 2: Create writer user
Create a database user to perform the writing of the data.
- Open a connection to your ClickHouse database.
- Create a user for the data transfer by executing the following SQL command.
CREATE USER <username>@'%' IDENTIFIED BY '<some-password>';
- Grant user required privileges on the database.
GRANT CREATE, INSERT, DROP, ALTER, OPTIMIZE, SHOW ON <database.*> TO <username>@'%';
grant CREATE TEMPORARY TABLE, S3 on *.* to <username>@'%';
Understanding the
CREATE TEMPORARY TABLE, S3
permissionsThe
CREATE TEMPORARY TABLE
andS3
permissions are required to efficiently transfer data to ClickHouse. Under the hood, these permissions are used to stage data in object storage as compressed files, COPY INTO temporary tables, and finally merge into the target tables. By definition, the temporary table will not exist outside of the session.
Step 3: Setup staging bucket
ClickHouse sources require a staging bucket to efficiently transfer data. Configure your staging bucket using one of the following types of ClickHouse supported object storage:
- S3
- GCS
- Implicit
Using the
implicit
bucket optionClickHouse supports the ability to configure staging resources with environment credentials. If this setting is enabled on your ClickHouse cluster, you may choose to use the configured implicit staging resources using the
implicit
option for the staging bucket selection.
Step 4: Add your destination
Securely share your host name, port, cluster, database name, schema name, username, password, and staging bucket details with us to complete the connection.
Understanding the
database
vs.schema
fields (connection database
vs.write database
)Depending on the version of your integration, you may be asked for both a
database
andschema
, or aconnection database
andwrite database
.
database
(also referred to asconnection_database
): is the database used to establish the connection with ClickHouse.schema
(also referred to aswrite_database
): is the database/schema within which data will be writtenThese can be (and often are) the same values, but do not need to be.
Using the ClickHouse data
Querying ClickHouse data without duplicates
The resulting ClickHouse tables use the ReplacingMergeTree table engine in order to efficiently upsert changes. To properly query this data, the
FINAL
keyword must be used when selecting from these tables guarantee duplicates are removed. For example:SELECT * FROM schema.table FINAL WHERE foo = bar ORDER BY foo LIMIT 10;
Updated 9 months ago