Guides → Configure a Tenant on ADLS Gen2
You can configure an Incorta Tenant on Azure Data Lake Storage (ADLS) Gen2, using it as a file system to save files, including Parquet and snapshot files. Here are the steps:
- Create a core-site.xml file
- Copy the core-site.xml file to your Incorta host
- Declare an environment variable on your Incorta host
- Create an ADLS Gen2 Tenant in the CMC
- Whitelist the ADLS Gen2 endpoints
- Verify the Wildfly JAR version
Create a core-site.xml file
Create a file and name it core-site.xml
. You will need to add your ADLS Gen2 credentials to the file. The following is the content of the core-site.xml
file:
<configuration><!-- other configuration --><property><name>fs.azure.account.auth.type</name><value>OAuth</value></property><property><name>fs.azure.account.oauth.provider.type</name><value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value></property><property><name>fs.azure.account.oauth2.client.endpoint</name><value>https://login.microsoftonline.com/$$TENANT_ID$$/oauth2/token</value></property><property><name>fs.azure.account.oauth2.client.id</name><value>$$CLIENT_ID$$</value></property><property><name>fs.azure.account.oauth2.client.secret</name><value>$$CLIENT_SECRET$$</value></property><!-- other configuration --></configuration>
Replace $$TENANT_ID$$, $$CLIENT_ID$$ and $$CLIENT_SECRET$$ with your ADLS Gen2 credentials. You can get these from the Azure web console.
Encrypt credentials (optional)
Credentials can be entered in plain text or encrypted. To encrypt credentials, with the Hadoop command line interface (CLI), you will generate the credentials as follows:
cd <INCORTA_INSTALLATION_PATH>/IncortaNode/hadoop/bin./hadoop credential create fs.azure.account.oauth2.client.id -value <client_id> -provider jceks://file/<KEY_STORE_PATH>.jceks./hadoop credential create fs.azure.account.oauth2.client.secret -value <client_secret> -provider jceks://file/<KEY_STORE_PATH>.jceks./hadoop credential create fs.azure.account.oauth2.client.endpoint -value <endpoint> -provider jceks://file/<KEY_STORE_PATH>.jceks
Edit the core-site.xml
file as follows:
<configuration><property><name>hadoop.security.credential.provider.path</name><value>jceks://file/<KEY_STORE_PATH>.jceks</value><description>Path to interrogate for protected credentials.</description></property><property><name>fs.azure.account.auth.type</name><value>OAuth</value></property><property><name>fs.azure.account.oauth.provider.type</name><value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value></property></configuration>
Copy the core-site.xml file to your Incorta host
Copy the core-site.xml files to the following directories on the host running your Incorta nodes:
<INCORTA_INSTALLATION_PATH>/cmc/lib/core-site.xml
<INCORTA_INSTALLATION_PATH>/cmc/tmt/core-site.xml
<INCORTA_INSTALLATION_PATH>/IncortaNode/hadoop/etc/hadoop/core-site.xml
<INCORTA_INSTALLATION_PATH>/IncortaNode/runtime/lib/core-site.xml
<INCORTA_INSTALLATION_PATH>/IncortaNode/runtime/webapps/incorta/WEB-INF/lib/core-site.xml
You will need to restart Spark, the CMC, and the Analytics and Loader services after you copy the core-site.xml
file.
Declare an environment variable on your Incorta host
On the host running your Incorta nodes, declare the following environment variable in~/.bash_profile
or ~/.bashrc:
export INCORTA_USE_AZURE_APIS=true
Create an ADLS Gen2 Tenant in the CMC
Following are the steps to create an ADLS Tenant in the CMC:
Sign in to the CMC.
In the Navigation bar, select Clusters.
In the cluster list, select a Cluster name.
In the canvas tabs, select Tenants.
Select + → Create Tenant.
Enter a Tenant Name, Username, Password, and Email.
Enter the Shared Storage Path. For ADLS Gen2, the path will be:
abfs://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<DIRECTORY_PATH>
You will need to have read/write permission to the ADLS Gen2 path.
Whitelist the ADLS Gen2 endpoints
Whitelist the following ADLS Gen2 endpoints to ensure they are accessible:
Verify the Wildfly JAR version
In certain operating systems, the files wildfly-openssl-1.0.4.Final.jar
and wildfly-openssl-1.0.7.Final.jar
both exist under the following path: <INCORTA_INSTALLATION_PATH>/IncortaNode/hadoop/share/hadoop/tools/lib/
In this situation, you will need to remove wildfly-openssl-1.0.4.Final.jar
so that only wildfly-openssl-1.0.7.Final.jar
exists. You can backup wildfly-openssl-1.0.4.Final.jar
to a different directory as needed.
You will need to restart Spark, the CMC, and the Analytics and Loader services after you rename wildfly-openssl-1.0.4.Final.jar
.