References → SparkX Metastore Sync

Introduction

The SparkX Metastore is a MySQL database utilized by the Advanced SQLi for maintaining the metadata of physical schema tables and verified business views it can access. The Advanced SQLi exposes the list of tables and views in the SparkX Metastore to its consumers, such as Tableau and Power BI, to allow them to query these tables and views.

Note

As of 2024.1.3, the Advanced SQLi supports optimized tables, non-optimized tables that do not have security filters, formula columns, or encrypted columns, and verified business views. Thus, only the metadata of these tables and views are synched to the SparkX Metastore.

Types of the Metastore sync

The SparkX Metastore requires to be synced with the Incorta metadata database to get the latest versions of physical and business schemas that the Advanced SQLi can access. The sync of the SparkX Metastore is triggered on demand or automatically.

Auto sync

The Spark Metastore sync is automatically triggered in the following cases:

  • Creating a new schema using the Schema Wizard
  • Schema updates, including actions such as adding, editing, or deleting tables

Additionally, the sync can be automatically triggered depending on the sync configurations:

  • When the inx.sparksql.sync.periodic.start is set to true, the sync is triggered automatically in the following cases:

    • The Analytics service startup
    • The tenant startup, and after creating or importing a tenant
      Note

      The default is true for Cloud installations and false for On-Premises installations.

  • When the inx.sparksql.sync.periodic.interval option is set to a value greater than 0, the sync is triggered periodically at the specified interval. The default is 720 minutes.

On-demand sync

You can also manually trigger the sync of the SparkX Metastore. There are multiple methods to perform an on-demand sync:

  • Using the Sync Spark Metastore option, available in the CMC: Clusters > <cluster_name> > Tenants > <tenant_name>. It syncs all supported physical and business schemas.
  • Using the sync_spark_metadata_all_schemas.sh script that exists under <incorta_installation_path>/IncortaNode/bin. It syncs all supported physical and business schemas.
  • Using the sync_spark_metadata_specific_schemas.sh script that exists under <incorta_installation_path>/IncortaNode/bin. It syncs only the specified schemas if they are supported.

Regular vs forced sync

In regular sync, only physical and business schemas that have been created or updated (that is, schemas with new versions) are populated to the SparkX Metastore.

Regular sync does not interrupt users who are using the Advanced SQLi. Auto sync and on-demand sync using the Sync Spark Metastore CMC option are regular sync.

In forced sync, the Metastore is completely cleared and then populated with the metadata of all supported physical schema objects and business views. You can also force the sync for specific schemas; in this case, only the records related to the specified schemas are deleted and recreated based on the latest updates in the Incorta metadata database.

Force the sync for all schemas

To force the Metastore sync for all schemas:

  1. In the terminal, navigate to <incorta_installation_path>/IncortaNode/bin.
  2. Edit the sync_spark_metadata_all_schemas.sh script, and update the Analytics URL, user name, and password.
  3. Set the $incorta_cmd sync_spark_meadata $session parameter to true to force the sync.
Note

Using the sync_spark_metadata_all_schemas.sh script to sync the Metastore while setting the $incorta_cmd sync_spark_meadata $session parameter to false performs a regular sync similar to the sync you trigger using the Sync Spark Metastore CMC option. It syncs all supported physical and business schemas that have new versions.

Forcing the sync of the Spark Metastore is required in the following cases:

  • New installations of 2024.1.x
  • Upgrading to 2024.1.x, including Cloud upgrades from 2024.1.0 to 2024.1.3
  • Creating or importing tenants if syncing at the tenant startup is disabled, that is, when the inx.sparksql.sync.periodic.start is set to false in the SparkX sync configurations.

Sync specific schemas

To sync the Metastore for specific schemas:

  1. In the terminal, navigate to <incorta_installation_path>/IncortaNode/bin.

  2. Edit the sync_spark_metadata_specific_schemas.sh script, and update the required parameters, such as the Incorta URL, user name, and password.

  3. In the following line, specify the schemas you want to sync as a comma-separated list and specify if you want to force the sync (true) or just get the updated schemas (false):
    $incorta_cmd sync_spark_metadata_schemas $session <INVALIDATE_SCHEMAS> <SCHEMAS_NAMES_LIST>.

    Example: $incorta_cmd sync_spark_metadata_schemas $session true Sales,BSch_Customers,Audit.

SparkX Sync Configurations

The following table lists all the Metastore sync configurations keys, value types, and default values, that are set in the following:

  • Incorta Metadata Database - the TENANT_CONFIG table
  • node.properties
  • The Analytics service's service.properties
Note

You can also set these configurations by using the update-property TMT command.

KeyValue TypeDefault ValueDescription
inx.sparksql.jdbc.domainString"localhost"The domain (can be IP or domain name) where Kyuubi is hosted.
Note: For custom installations, do not use localhost or 127.0.0.1.
inx.sparksql.jdbc.portNumber10009The TCP port on which Kyuubi listens.
Example: 10009
inx.sparksql.sync.periodic.intervalNumber720Sets the interval at which periodic sync triggers a sync request. The value is in minutes.
Accepted range is [1, 1440] in minutes. This is 1 minute is the minimum value, and 24 hours is the max.
Example: 180 (3 hours)
Note: Setting the value to 0 disables the periodic sync.
inx.sparksql.sync.periodic.startBooleanfalseInstructs metadata sync module to trigger a sync on tenant startup.
If this configuration is set to true, the module triggers a sync on tenant startup, then waits for the interval window to fire the next sync request.
If this configuration is cleared (or set to false), the module waits first the interval window to fire the first sync request, then fires periodically after every time interval.
inx.sparksql.jdbc.securedBooleanfalseInstructs the metadata sync to use a secured connection with Kyuubi.
Example: true
inx.sparksql.security.truststore.pathString""Sets the path to the truststore file. This is used only if inx.sparksql.jdbc.secured is set to true.
Example: /path/to/truststore
inx.sparksql.security.truststore.passwordString""Sets the password used to read the truststore. The password is in plain text.
inx.sparksql.jdbc.usernameStringTenant Owner usernameThe username used for the metadata sync module to connect to Kyuubi.
This configuration should be left to be automatically set to the tenant owner.
inx.sparksql.jdbc.passwordPassword StringAuto-generated password.The password used for metadata sync to connect to Kyuubi.
This configuration should be left to be automatically set. It’s set to an auto-generated password every time the analytics starts up.
Notes
  • "" denotes empty string.
  • String value configurations are set without quotations.
  • The labs.sql.x.enabled must be set to true.

Additional Considerations

  • Sync is limited to schema names with a maximum length of 250 characters.
  • To have the Spark Metadata synced correctly, make sure of the following before syncing:
    • The Advanced SQL Interface toggle in the CMC > Server Configurations > Incorta Labs is turned on.
    • The primary Analytics service, the Advanced SQL service, and SparkX (or Spark starting 2024.7.x) must be started and running.