References → SparkX Access to Cloud Storage
This content applies to 2024.1.x On-Premises installations.
The unified Spark version bundled with Incorta starting 2024.7.x has the required jars under this path /<incorta_installation_path>/IncortaNode/spark/jars
.
Starting 2024.1.3, SparkX requires access to the Parquet files to enable the discovery of non-optimized tables via the Advanced SQLi. For Cloud installations and On-Premises installations on local servers, SparkX can automatically access Parquet files after configuring the Advanced SQLi. However, for On-Premises tenants that use a cloud storage file system, such as Azure, AWS, and Google Cloud Storage (GCS), you must manually set some additional configurations to allow SparkX access to Parquet files on these cloud storage services.
These configurations are required to allow Incorta to monitor the queries run via the Advanced SQLi against tenants stored on these cloud storage services.
After setting the required configurations, you must restart all services, including the Loader, Analytics, and Advanced SQLi services. You must also restart SparkX by running the following commands: ./stopSparkX.sh
and ./startSparkX.sh
.
GCS configurations
Here are the steps required to allow SparkX to access Parquet files on Google Cloud Service:
- Copy the
core-site.xml
, from/<incorta_installation_path>/IncortaNode/runtime/lib/
for example, to/IncortaNode/sparkX/conf
. - Copy the
gcs-connector-hadoop3-2.2.11-shaded.jar
from the/IncortaNode/spark/jars
directory to/IncortaNode/sparkX/custom-jars
. - Navigate to the following path:
/<incorta_installation_path>/IncortaNode/kyuubi/services/<service_GUID>/conf/
and add the following configurations tokyuubi-defaults.conf
:spark.driver.extraClassPath=<incorta_installation_path>/IncortaNode/sparkX/custom-jars/*
spark.executor.extraClassPath=<incorta_installation_path>/IncortaNode/sparkX/custom-jars/*
- Restart the services.
Azure configurations
Here are the steps required to allow SparkX to access Parquet files on Microsoft Azure:
- Copy the
core-site.xml
, from/<incorta_installation_path>/IncortaNode/runtime/lib/
for example, to/IncortaNode/sparkX/conf
. - Copy the following .jar files from
/IncortaNode/Spark/jars
to/IncortaNode/sparkX/custom-jars
:azure-data-lake-store-sdk-2.3.9.jar
azure-keyvault-1.0.0.jar
azure-storage-7.0.1.jar
hadoop-azure-3.3.4.jar
hadoop-azure-datalake-3.3.4.jar
- Navigate to the following path:
/<incorta_installation_path>/IncortaNode/kyuubi/services/<service_GUID>/conf/
and add the following configurations tokyuubi-defaults.conf
:spark.driver.extraClassPath=<incorta_installation_path>/IncortaNode/sparkX/custom-jars/*
spark.executor.extraClassPath=<incorta_installation_path>/IncortaNode/sparkX/custom-jars/*
- Restart the services.
AWS configurations
Here are the steps required to allow SparkX to access Parquet files on Amazon AWS:
- Copy the
core-site.xml
, from/<incorta_installation_path>/IncortaNode/runtime/lib/
for example, to/IncortaNode/sparkX/conf
. - Copy the following .jar files from
/IncortaNode/Spark/jars
to/IncortaNode/sparkX/custom-jars
:aws-java-sdk-1.12.262.jar
aws-java-sdk-core-1.12.262.jar
aws-java-sdk-dynamodb-1.12.262.jar
aws-java-sdk-s3-1.12.262.jar
hadoop-aws-3.3.4.jar
- Navigate to the following path:
/<incorta_installation_path>/IncortaNode/kyuubi/services/<service_GUID>/conf/
and add the following configurations tokyuubi-defaults.conf
:spark.driver.extraClassPath=<incorta_installation_path>/IncortaNode/sparkX/custom-jars/*
spark.executor.extraClassPath=<incorta_installation_path>/IncortaNode/sparkX/custom-jars/*
- Restart the services.