References → Advanced SQLi Setup and Configuration
In the below guide, you will find the instructions to setup and configure the cloud or on-prem Advanced SQL interfaces.
How to activate the Cloud Advanced SQL Interface
To activate the Advanced SQL Interface, follow these steps in the Cloud Admin Portal (CAP):
- Navigate to the Cluster Configurations.
- Enable the Enable Advanced SQL Interface toggle.
- Loader and Analytics services will restart automatically.
How to Activate On-Prem Advanced SQL Interface
Prerequisites
The following prerequisites are required to configure the Advanced SQL Interface On-Prem:
- Cluster must have two machines containing each Spark version:
- Custom Spark 3.4.1 (SparkX) - for queries and responses to the Advanced SQL interface.
- Classic Spark is for handling MVs, SQLi, and loader tasks.
- An installation of MySQL 8.
- Configure MySQL User Access for the Metastore
- Configure the Kyuubi and the SparkMetastore
- An active and synced Spark Metastore
- Kyuubi ports availability, refer to Advanced SQLi Ports
- For Azure Clusters: A copy of the core-site.xml
Install MySQL Server
You may proceed to the next step for clusters already using MySQL 8 as their metadata database. For clusters using Oracle as their metadata database, you must Install MySQL 8 to begin setting up the Spark Metastore. An installation guide for MySQL is available on the MySQL documentation site.
Configure MySQL User Access for the Metastore
The Spark Metastore, the previously installed MySQL database is utilized by Incorta for maintaining metadata, exclusively supports the synchronization of optimized tables and verified business views. Users with the necessary authorization can utilize the Spark Metastore to access a comprehensive list of schemas and verified views within business schemas.
You must grant the MySQL user access to create and drop schema using the following command:
GRANT CREATE, DROP, INDEX, REFERENCES, INSERT, DELETE, UPDATE, SELECT, ALTER, LOCK TABLES, EXECUTE ON *.* TO 'SPARK_Metastore_Admin_USER_NAME'@'localhost';
After completing the access grant, run the following to ensure previously cached privileges are removed:
FLUSH PRIVILEGES;
Knowing that you must replace SPARK_Metastore_Admin_USER_NAME
with the username of the MySQL user you want to grant these privileges to. You must also replace localhost
with the appropriate hostname or IP address in case you are granting access from a different location.
After running this command, the user will have the necessary privileges to create and drop schemas.
Synchronization with the Spark Metastore is not triggered by every event but specifically occurs under the following circumstances:
- Creation of a new schema using the Schema Wizard
- Schema Updates, including actions such as adding, editing, or deleting tables
- Analytics service startup (see inx.sparksql.sync.periodic.start in metadata sync configurations below for Cloud/On-Prem defaults)
- On-Demand Sync initiated through the "Sync Spark Metastore" option available in the CMC in Clusters > cluster-name > Tenants > tenant-name > More Options.
Synchronization is limited to schema names with a maximum length of 250 characters.
Syncing the Spark Metastore is required in the following cases:
- Upgrading to 2024.1.x
- New installations of 2024.1.x
- Creating or importing tenants if the syncing at the tenant startup is disabled, that is, when the
inx.sparksql.sync.periodic.start
is set tofalse
in the SparkX sync configurations.
Set up and configure Spark
Spark for Advanced SQLi is automatically installed during a fresh installation or an upgrade under the following path: /incorta/kyuubi/IncortaNode/sparkX
.
After the installation, you need to do the following:
- Start spark manually through your command prompt using the following command:
IncortaNode/startSparkX.sh
. - Retrieve and save the SparkX master URL from the command prompt logs. The URL will be in this format:
spark://<url>:<port_number>
.
To stop SparkX, run the following command: IncortaNode/stopSparkX.sh
.
Set up and configure Kyuubi and Spark Metastore
For Azure instances, copy the core-site.xml to INCORTA_HOME/IncortaNode/kyuubi/runtime/jars/
For new clusters:
Using the CMC, you can set up and configure both Kyuubi and Spark using the following steps:
- Log in to the CMC.
- Create a new cluster.
- In the cluster wizard, ensure adding:
- Spark Metastore Database details
- Kyuubi Service details
- Continue using the wizard to complete the cluster setup.
- In the Review step, ensure you enter all the needed data.
- Select Create.
For upgraded clusters:
- Log in to the CMC.
- Edit the existing cluster.
- Enter Metastore Database details: Metastore Host, Metastore Host, Metastore Username, and Metastore Password.
- Select Save.
- Under Clusters > <your_cluster> > Nodes, add a new Advanced SQL service.
- Enter service details, and then select Create.
Start the Advanced SQL service
After setting up and configuring Kyuubi and Spark, do the following:
- Navigate to the CMC, and enable the Enable Advanced SQL Interface toggle that you can find under Cluster Configurations > Server Configurations > Incorta Labs.
- Restart all the Analytics and Loader services.
- Start the Advanced SQL service that you have added. There are two ways to start the Advanced SQL service:
- By using the CMC: navigate to the cluster > Nodes, and then start the Advanced SQL service.
- By using the terminal, run the following command:
./<incorta_installation>/kyuubi/IncortaNode/IncortaNode/startKyuubi.sh <service name>
.
- To get the name of the Advanced SQL service using the terminal, navigate to
/<incorta_installation>/IncortaNode/kyuubi/services/
, and then runcat services.index
. - To stop the Advanced SQL service, use this command:
./<incorta_installation>/kyuubi/IncortaNode/stopKyuubi.sh <service_name>
. - To check the status of the Advanced SQL service, use this command:
./<incorta_installation>/kyuubi/IncortaNode/kyuubiStatus.sh <service_name>
.
Sync the Metastore
Once the Advanced SQL Interface is active, it’s important to sync the metastore so it is populated with all the metadata from physical and business schemas. This step should be done manually once the installation is finished. After the force sync, the metastore will update on a set period (12 hours by default), or any time there is an update to a schema or business schema.
To have the Spark Metadata synced correctly, make sure of the following before syncing:
- The Advanced SQL Interface toggle is turned on.
- The Analytics service is started and running.
- The Advanced SQL service is started and running.
To run a force sync, please complete the following steps:
- Go to the script with the name:
sync_spark_metadata_all_schemas.sh
under<installation_path>/IncortaNode/bin
- Edit the script in the terminal. Update the analytics URL, user name, and password.
- Ensure you set the parameter for force sync to
true
in the following line:$incorta_cmd sync_spark_meadata $session true
- Run command
./sync_spark_metadata_all_schemas.sh
.
For details, refer to SparkX Metastore Sync.
Additional Considerations
- The username format to use is:
<username>%<tenant-name>
- To authenticate Incorta, you must generate a personal access token (PAT) to use as a password while connecting to the advanced SQLi.
- If you want to use OAuth 2.0 for authentication instead of PAT, you must enable OAuth 2.0-based authentication for JDBC connection from the CMC under Default Tenant Configurations > Integrations.
- Advanced SQLi retrieves verified Views only. When you try querying a non-verified view, SQLi will result in a “not found” error.
- Using the advanced SQLi requires your data to be synched with the Spark Metastore.
- In case of using a tenant with the same name as a previously deleted one, please contact Incorta Support so you can avoid facing issues while using the Advanced SQL Interface.
Advanced SQLi primary Analytics service
The Advanced SQLi supports the communication with only one Analytics service; the primary Analytics service. That’s why it is recommended to enable the Advanced SQLi on clusters with a single Analytics service only.
Starting 2024.1.3, you can enable the Advanced SQLi on a cluster with multiple Analytics services. However, you must allow only one Analytics service to be the primary service that the Advanced SQLi can use and communicate with.
No configurations are required for the Analytics service you want to designate as the primary service. For all other services, edit the service.properies
or node.properties
(if applicable), and add the following properties:
labs.sql.x.enabled=false
engine.enable.null.handling=true
Restart all the impacted Analytics Services. No need to restart the primary service.
- You must apply the same steps when adding a new Analytics service to the cluster.
- For Cloud installations, contact Incorta Support to set these configurations.
- If the primary Analytics service is not running, the Advanced SQLi will not be functioning properly.
Non-English characters support
Spark and Hive does not support non-English characters, yet there are a few settings that you can adjust in your third-party tool to overcome such a limitation.
For Power BI, check the Use Native Query option in the Advanced Options window. With this option enabled, there is a limitation with schema names over 128 characters.
For DBeaver and DB Visualizer, update all needed templates in the SQL Statements section to have extra backticks "`". Knowing that, tables and columns with “$” in their names cannot be listed in the UI, but you can use them normally in queries.
For Tableau, non-English tables cannot be listed while using the Spark Simba connector, instead, use the Incorta Tableau connector.
Additional information
You can find Kyuubi configurations, log, and work folders in the following location:
/incorta/kyuubi/IncortaNode/kyuubi/services/fcdb8023-8e23-4212-b0c3-3f6d5ea1beab
You must also secure SparkX metadata synchronization, but you must make sure you have the following:
- Secured Kyuubi up and running.
- The same truststore for the keystore used by Kyuubi.
- The truststore password.
Advanced Configurations Steps
- Update the following configurations (either in your DB,
node.properties
, or analyticsservice.properties
file)inx.sparksql.jdbc.secured = true
inx.sparksql.security.truststore.path = /path/to/truststore
inx.sparksql.security.truststore.password = p@s$w0rd
- Restart the Analytics service.
For more information, refer to SparkX Metastore Sync Configurations.