References → SparkX Metastore Sync
Introduction
The SparkX Metastore is a MySQL database utilized by the Advanced SQLi for maintaining the metadata of physical schema tables and verified business views it can access. The Advanced SQLi exposes the list of tables and views in the SparkX Metastore to its consumers, such as Tableau and Power BI, to allow them to query these tables and views.
As of 2024.1.3, the Advanced SQLi supports optimized tables, non-optimized tables that do not have security filters, formula columns, or encrypted columns, and verified business views. Thus, only the metadata of these tables and views are synched to the SparkX Metastore.
Types of the Metastore sync
The SparkX Metastore requires to be synced with the Incorta metadata database to get the latest versions of physical and business schemas that the Advanced SQLi can access. The sync of the SparkX Metastore is triggered on demand or automatically.
Auto sync
The Spark Metastore sync is automatically triggered in the following cases:
- Creating a new schema using the Schema Wizard
- Schema updates, including actions such as adding, editing, or deleting tables
Additionally, the sync can be automatically triggered depending on the sync configurations:
When the
inx.sparksql.sync.periodic.start
is set totrue
, the sync is triggered automatically in the following cases:- The Analytics service startup
- The tenant startup, and after creating or importing a tenantNote
The default is
true
for Cloud installations andfalse
for On-Premises installations.
When the
inx.sparksql.sync.periodic.interval
option is set to a value greater than 0, the sync is triggered periodically at the specified interval. The default is 720 minutes.
On-demand sync
You can also manually trigger the sync of the SparkX Metastore. There are multiple methods to perform an on-demand sync:
- Using the Sync Spark Metastore option, available in the CMC: Clusters > <cluster_name> > Tenants > <tenant_name>. It syncs all supported physical and business schemas.
- Using the
sync_spark_metadata_all_schemas.sh
script that exists under<incorta_installation_path>/IncortaNode/bin
. It syncs all supported physical and business schemas. - Using the
sync_spark_metadata_specific_schemas.sh
script that exists under<incorta_installation_path>/IncortaNode/bin
. It syncs only the specified schemas if they are supported.
Regular vs forced sync
In regular sync, only physical and business schemas that have been created or updated (that is, schemas with new versions) are populated to the SparkX Metastore.
Regular sync does not interrupt users who are using the Advanced SQLi. Auto sync and on-demand sync using the Sync Spark Metastore CMC option are regular sync.
In forced sync, the Metastore is completely cleared and then populated with the metadata of all supported physical schema objects and business views. You can also force the sync for specific schemas; in this case, only the records related to the specified schemas are deleted and recreated based on the latest updates in the Incorta metadata database.
Force the sync for all schemas
To force the Metastore sync for all schemas:
- In the terminal, navigate to
<incorta_installation_path>/IncortaNode/bin
. - Edit the
sync_spark_metadata_all_schemas.sh
script, and update the Analytics URL, user name, and password. - Set the
$incorta_cmd sync_spark_meadata $session
parameter totrue
to force the sync.
Using the sync_spark_metadata_all_schemas.sh
script to sync the Metastore while setting the $incorta_cmd sync_spark_meadata $session
parameter to false
performs a regular sync similar to the sync you trigger using the Sync Spark Metastore CMC option. It syncs all supported physical and business schemas that have new versions.
Forcing the sync of the Spark Metastore is required in the following cases:
- New installations of 2024.1.x
- Upgrading to 2024.1.x, including Cloud upgrades from 2024.1.0 to 2024.1.3
- Creating or importing tenants if syncing at the tenant startup is disabled, that is, when the
inx.sparksql.sync.periodic.start
is set tofalse
in the SparkX sync configurations.
Sync specific schemas
To sync the Metastore for specific schemas:
In the terminal, navigate to
<incorta_installation_path>/IncortaNode/bin
.Edit the
sync_spark_metadata_specific_schemas.sh
script, and update the required parameters, such as the Incorta URL, user name, and password.In the following line, specify the schemas you want to sync as a comma-separated list and specify if you want to force the sync (
true
) or just get the updated schemas (false
):
$incorta_cmd sync_spark_metadata_schemas $session <INVALIDATE_SCHEMAS> <SCHEMAS_NAMES_LIST>
.Example:
$incorta_cmd sync_spark_metadata_schemas $session true Sales,BSch_Customers,Audit
.
SparkX Sync Configurations
The following table lists all the Metastore sync configurations keys, value types, and default values, that are set in the following:
- Incorta Metadata Database - the TENANT_CONFIG table
node.properties
- The Analytics service's
service.properties
You can also set these configurations by using the update-property
TMT command.
Key | Value Type | Default Value | Description |
---|---|---|---|
inx.sparksql.jdbc.domain | String | "localhost" | The domain (can be IP or domain name) where Kyuubi is hosted. Note: For custom installations, do not use localhost or 127.0.0.1. |
inx.sparksql.jdbc.port | Number | 10009 | The TCP port on which Kyuubi listens. Example: 10009 |
inx.sparksql.sync.periodic.interval | Number | 720 | Sets the interval at which periodic sync triggers a sync request. The value is in minutes. Accepted range is [1, 1440] in minutes. This is 1 minute is the minimum value, and 24 hours is the max. Example: 180 (3 hours) Note: Setting the value to 0 disables the periodic sync. |
inx.sparksql.sync.periodic.start | Boolean | false | Instructs metadata sync module to trigger a sync on tenant startup. If this configuration is set to true, the module triggers a sync on tenant startup, then waits for the interval window to fire the next sync request. If this configuration is cleared (or set to false), the module waits first the interval window to fire the first sync request, then fires periodically after every time interval. |
inx.sparksql.jdbc.secured | Boolean | false | Instructs the metadata sync to use a secured connection with Kyuubi. Example: true |
inx.sparksql.security.truststore.path | String | "" | Sets the path to the truststore file. This is used only if inx.sparksql.jdbc.secured is set to true. Example: /path/to/truststore |
inx.sparksql.security.truststore.password | String | "" | Sets the password used to read the truststore. The password is in plain text. |
inx.sparksql.jdbc.username | String | Tenant Owner username | The username used for the metadata sync module to connect to Kyuubi. This configuration should be left to be automatically set to the tenant owner. |
inx.sparksql.jdbc.password | Password String | Auto-generated password. | The password used for metadata sync to connect to Kyuubi. This configuration should be left to be automatically set. It’s set to an auto-generated password every time the analytics starts up. |
""
denotes empty string.- String value configurations are set without quotations.
- The
labs.sql.x.enabled
must be set totrue
.
Additional Considerations
- Sync is limited to schema names with a maximum length of 250 characters.
- To have the Spark Metadata synced correctly, make sure of the following before syncing:
- The Advanced SQL Interface toggle in the CMC > Server Configurations > Incorta Labs is turned on.
- The primary Analytics service, the Advanced SQL service, and SparkX (or Spark starting 2024.7.x) must be started and running.