Concepts → Log-Based Incremental Load

Overview

For releases before 2024.1, Incorta supports only a query-based approach to load data incrementally. This approach relies on either a LAST_UPDATED_TIMESTAMP column or a column with a monotonically increasing maximum value to track changes (MAX_VALUE). If the source table does not include columns that identify the last updated time or newly inserted records, query-based incremental load cannot be used.

To address these challenges, Incorta introduced log-based incremental load starting with release 2024.1, enabling reliable incremental loading for inserts and updates without requiring specific columns, while also eliminating performance impact on source systems.

Starting with release 2025.7, the log-based incremental load mechanism was enhanced to support delete operations in the source system through a soft delete approach in Incorta.

Notes

After upgrading from a previous release to 2025.7 or later releases:

  • When creating a new table using log-based incremental load, a special column is added to indicate delete operations. You can use this column:
    • In queries to filter out soft-deleted rows.
    • In a purge job to physically remove these rows from Incorta.
  • This column is added to existing tables with log-based-incremental loads when you validate the tables.

Prerequisites

To use the log-based incremental load, do the following:

  1. Install and configure Apache Kafka and Kafka Connect.
  2. Set up the source database so that Debezium connector can capture the transaction change.
  3. Configure the Debezium connector. Debezium is an open-source distributed platform for Change Data Capture (CDC).
  4. Disable snapshot while configuring Debezium.
  5. Ensure that the Debezium connector is configured to send data types to Incorta by adding the propagate property.
  6. Log-based incremental load supports only database physical tables with primary keys.
  7. In the case of using the Apicurio schema registry (available starting with connector version 2.2.5.0, configure it before enabling the feature.

Supported connectors

The log-based incremental load is currently supported for the following SQL-based connectors:

Limitations
  • You might face issues with the INTERVAL data types in Oracle and PostgreSQL.
  • In connector versions before 2.2.3.0, log-based incremental load supports only Kafka topics with a single partition.

How it works

After completing the prerequisites, do the following:

Step 1: Create a data source with log-based incremental load enabled

  1. In the Data Manager, create or update a data source using a supported connector.
  2. Turn on the Enable the Log-Based Incremental Load toggle.
  3. Set the related configurations. For details, see Log-based incremental load configurations.
  4. Save the changes.

Log-based incremental load configurations

The following table describes the available properties when you enable log-based incremental load for a data source using a supported connector.

PropertyControlDescription
Kafka Topic Prefixtext boxEnter the prefix part in your Kafka topic names that the CDC tool, Debezium, for example, uses to route all schema changes to.
Kafka Cluster URl(s)text boxEnter a comma-separated list for your Kafka cluster URLs in the following format: listener.security.protocol://your.host.name:port
For example: PLAINTEXT://localhost:9092, SSL://myKafkaServer:9093
Consumer Configurationstext boxA line-separated list of connection properties in the format: propertyName=propertyValue. For a complete list of properties, refer to the Kafka online documentation > Configuration > Consumer Configs.
Security Protocoldropdown listSelect the security protocol your Kafka server uses. Available options are:
  ●  Use Consumer Properties (if you have provided the required configurations in the consumer properties)
  ●  None
  ●  SASL PLAINTEXT
  ●  SSL
SASL PLAINTEXT > SASL Mechanism for Client Connectiondropdown listSpecify the authentication method used within the Simple Authentication and Security Layer (SASL) framework to verify identities. Available opinions are: PLAIN (simple username/password) GSSAPI (Kerberos)
SASL PLAINTEXT > JAAS Configtext boxEnter the Java Authentication and Authorization Service configurations
SASL PLAINTEXT > JAAS Usernametext boxEnter the JAAS username
SASL PLAINTEXT > JAAS Passwordtext boxEnter the JAAS password
SASL PLAINTEXT > Use Secret ManagercheckboxAvailable only when you configure a secret manager in the CMC. Select the checkbox to enter the password identifier in the integrated secret manager instead of its value.
SASL PLAINTEXT > JAAS Password Identifiertext boxAvailable when you select the Use Secret Manager checkbox. Enter the identifier of the JAAS password.
SSL > Protocoldropdown listThe protocol to be used for the TLS communication. Available options are:
  ●  TLS v1
  ●  TLS v1.1
  ●  TLS v1.2
SSL > Endpoint Identification Algorithmtext boxSpecify the algorithm for the hostname validation. An empty string disables hostname verification, which is the default. Enter https to enable hostname verification.
SSL > Trust Store FilebuttonTo upload a trust store file, select Choose File. In the Finder or File Explorer, select your trust store file, such as kafka.client.truststore.jks
SSL > Use Secret ManagercheckboxAvailable only when you configure a secret manager in the CMC. Select the checkbox to enter the trust store file identifier in the integrated secret manager instead of uploading the file itself.
SSL > Trust Store File Identifiertext boxAvailable when you select the Use Secret Manager checkbox. Enter the identifier of the trust store file.
SSL > Trust Store Passwordtext boxSpecify the password of the trust store file. Without a password, the trust store file remains available, but without integrity checks.
SSL > Use Secret ManagercheckboxAvailable only when you configure a secret manager in the CMC. Select the checkbox to enter the trust store password identifier in the integrated secret manager instead of its value.
SSL > Trust Store Password Identifiertext boxAvailable when you select the Use Secret Manager checkbox. Enter the identifier of the trust store password.
SSL > Key Store FilebuttonTo upload a key store file, select Choose File. In the Finder or File Explorer, select your key store file, such as Kafka.client.keystore.jks.
SSL > Use Secret ManagercheckboxAvailable only when you configure a secret manager in the CMC. Select the checkbox to enter the key store file identifier in the integrated secret manager instead of uploading the file itself.
SSL > Key Store File Identifiertext boxAvailable when you select the Use Secret Manager checkbox. Enter the identifier of the key store file.
SSL > Key Store Passwordtext boxSpecify the password key store file
SSL > Use Secret ManagercheckboxAvailable only when you configure a secret manager in the CMC. Select the checkbox to enter the key store password identifier in the integrated secret manager instead of its value.
SSL > Key Store Password Identifiertext boxAvailable when you select the Use Secret Manager checkbox. Enter the identifier of the key store password.
SSL > Key Passwordtext boxSpecify the password of the private key in the key store file
SSL > Use Secret ManagercheckboxAvailable only when you configure a secret manager in the CMC. Select the checkbox to enter the key password identifier in the integrated secret manager instead of its value.
SSL > Key Password Identifiertext boxAvailable when you select the Use Secret Manager checkbox. Enter the identifier of the key password.
Use Schema Registrydropdown listThis option is available starting with connector version 2.2.5.0. Specify if you use a schema registry to save the Kafka schema definition instead of including it in the messages. The available option for now is Apicurio.
Apicurio > Schema Registry URLtext boxEnter the Apicurio host in this format: http://HOST:PORT.
Apicurio > Apicurio Schema Registry Cache Expiry (Hours)spin boxSpecify the number of hours for which Incorta caches the schema entries.
Apicurio > Schema Registry Cache Size (Entries)spin boxSpecify the maximum number of schema entries that Incorta can cache.
Apicurio > Message Formatdropdown listSelect the message format. Available options are:
  ●  AVRO (Optimizes the message size)
  ●  JSON
Apicurio > AVRO > Enable Apicurio Confluent Compatible ModetoggleEnable this option so that the connector supports Confluent serializers for interoperability with Confluent clients and tools.

Step 2: Create a dataset based on the data source

  1. In the Table Editor, create or update a dataset based on the data source you have created in the previous step.
  2. Turn on the Incremental toggle.
  3. For the Incremental Type option, select Log-based (CDC).
  4. Save the changes.

Step 3: Load data incrementally

You can run a manual or scheduled incremental load job to update Incorta physical tables with the changes in the data source.

Note

In general, new tables require an initial full load first before running an incremental load job, whether query-based or log-based. Skipping this step will cause the first incremental load job to perform a full load instead.

Additional considerations

Optimized performance using schema registry

Connector version 2.2.5.0 introduces schema registry integration with Apicurio, fundamentally improving incremental data ingestion performance by externalizing schema definitions from individual Kafka messages to a schema registry service, eliminating redundant schema metadata embedded in each message. This enhancement dramatically reduced message payload sizes, significantly accelerating log-based incremental loads while reducing network bandwidth and storage requirements.

Additionally, schema registry supports AVRO message format processing, delivering superior binary compression and serialization efficiency compared to verbose JSON formats, further accelerating data transfer speeds and optimizing infrastructure costs for high-volume streaming workloads.