Skip to main content

Clickhouse Migration Guide

This version removes the option to use "normalization" with clickhouse. It also changes the schema and database of Airbyte's "raw" tables to be compatible with the new Destinations V2 format. These changes will likely require updates to downstream dbt / SQL models. After this update, Airbyte will only produce the ‘raw’ v2 tables, which store all content in JSON. These changes remove the ability to do deduplicated syncs with Clickhouse. (Clickhouse has an overview)[[https://clickhouse.com/docs/en/integrations/dbt]] for integrating with dbt If you are interested in the Clickhouse destination gaining the full features of Destinations V2 (including final tables), click [[https://github.com/airbytehq/airbyte/discussions/35339]] to register your interest.

This upgrade will ignore any existing raw tables and will not migrate any data to the new schema. For each stream, you could perform the following query to migrate the data from the old raw table to the new raw table:

-- assumes your database was 'default'
-- replace `{{stream_name}}` with replace your stream name

CREATE TABLE airbyte_internal.default_raw__stream_{{stream_name}}
(
`_airbyte_raw_id` String,
`_airbyte_extracted_at` DateTime64(3, 'GMT') DEFAULT now(),
`_airbyte_loaded_at` DateTime64(3, 'GMT') NULL,
`_airbyte_data` String,
PRIMARY KEY(`_airbyte_raw_id`)
)
ENGINE = MergeTree;

INSERT INTO `airbyte_internal`.`default_raw__stream_{{stream_name}}`
SELECT
`_airbyte_ab_id` AS "_airbyte_raw_id",
`_airbyte_emitted_at` AS "_airbyte_extracted_at",
NULL AS "_airbyte_loaded_at",
_airbyte_data AS "_airbyte_data"
FROM default._airbyte_raw_{{stream_name}};

Airbyte will not delete any of your v1 data.

We have split the raw and final tables into their own schemas, which in clickhouse is analogous to a database. For the Clickhouse destination, this means that we will only write into the raw table which will live in the airbyte_internal database. The tables written into this schema will be prefixed with either the default database provided in the DB Name field when configuring clickhouse (but can also be overridden in the connection). You can change the "raw" database from the default airbyte_internal by supplying a value for Raw Table Schema Name.

For Example:

  • DB Name: default
  • Stream Name: my_stream

Writes to airbyte_intneral.default_raw__stream_my_stream

where as:

  • DB Name: default
  • Stream Name: my_stream
  • Raw Table Schema Name: raw_data

Writes to: raw_data.default_raw__stream_my_stream