Guides → Configure a Tenant on ADLS Gen2
This content applies to On-Premises installations of releases starting 2024.1.x.
You can configure an Incorta Tenant on Azure Data Lake Storage (ADLS) Gen2, using it as a file system to save files, including Parquet and snapshot files. Here are the steps:
- Create a core-site.xml file using any text editor.
- Secure sensitive data included in the file.
- Propagate the file to the required locations.
- Create an ADLS Gen2 Tenant in the Cluster Management Console (CMC).
- If required, whitelist the ADLS Gen2 endpoints.
Create a core-site.xml file
Create a file and name it core-site.xml
. You will need to add your ADLS Gen2 credentials to the file. After creating the file, you can securely store the credentials using a Java KeyStore to avoid having them in plain text in the core-site.xml.
Then, you need to add the file to the required locations.
The content of the core-site.xml
file varies according to the authentication type you use: OAuth or Shared Key.
OAuth authentication
The following is the content of the core-site.xml
file in the case of OAuth authentication:
<configuration><!-- other configuration --><property><name>fs.azure.account.auth.type</name><value>OAuth</value></property><property><name>fs.azure.account.oauth.provider.type</name><value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value></property><property><name>fs.azure.account.oauth2.client.endpoint</name><value>https://login.microsoftonline.com/$$TENANT_ID$$/oauth2/token</value></property><property><name>fs.azure.account.oauth2.client.id</name><value>$$CLIENT_ID$$</value></property><property><name>fs.azure.account.oauth2.client.secret</name><value>$$CLIENT_SECRET$$</value></property><!-- other configuration --></configuration>
Replace $$TENANT_ID$$
, $$CLIENT_ID$$
, and $$CLIENT_SECRET$$
with your ADLS Gen2 credentials. You can get these from the Azure web console.
For more details, refer Register an application in Microsoft Entra ID.
Shared Key authentication
The following is the content of the core-site.xml
file in the case of Shared Key authentication:
<configuration><!-- other configuration --><property><name>fs.azure.account.auth.type.$$ACCOUNT_NAME$$.dfs.core.windows.net</name><value>SharedKey</value></property><property><name>fs.azure.account.key.$$ACCOUNT_NAME$$.dfs.core.windows.net</name><value>$$SHARED_KEY$$</value></property><!-- other configuration --></configuration>
Replace $$ACCOUNT_NAME$$
and $$SHARED_KEY$$
with your Azure storage account name and the associated access key.
For more details, refer to Manage storage account access keys.
Secure the core-site.xml sensitive data
Azure credentials and account details, including the client ID and secret or the Shared Key (storage account key), can be added to the core-site.xml
file in plain text or stored securely in a Java KeyStore.
You can secure the cote-site.xml
credentials using Incorta scripts or using the Hadoop binaries. Incorta scripts automate a few steps and do not require installing the Hadoop binaries on the node.
Securing the core-site.xml using Incorta
This method is available starting with 2024.7.4, enabling the storage of the sensitive data in a Java KeyStore. The KeyStore will be stored locally on the host machine where the Incorta node, whether a service node or the CMC node, is installed.
Here are the steps required to be performed on each Incorta node:
- Set the
HADOOP_CREDSTORE_PASSWORD
environment variable to a value representing the KeyStore password on the host machine. The steps vary depending on the operating system. The KeyStore password must be at least 4 characters long; however, it is recommended to use a strong password. - Navigate to the
keystore.properties
file:- For the CMC node:
<INCORTA_HOME>/cmc/bin/keystore-cli/keystore.properties
. - For a service node:
<INCORTA_HOME>/IncortaNode/bin/keystore-cli/keystore.properties
.
- For the CMC node:
- Populate the
keystore.properties
file with the aliases and keys that you want to protect: the alias is the property name within thecore-site.xml
file and the corresponding key should be the value of the same property. Multiple aliases and keys can be added in a comma-separated format. Example:aliases=fs.azure.account.oauth2.client.id,fs.azure.account.oauth2.client.secretkeys=Abc123-Def456-Ghi789,Xyz~098stu765Note- The number of aliases must match the number of keys.
- The values that can be encrypted vary according to the authentication type: Shared Key or OAuth.
- In the case of OAuth authentication, the values of the following properties can be encrypted:
fs.azure.account.oauth2.client.endpoint
fs.azure.account.oauth2.client.id
fs.azure.account.oauth2.client.secret
- In the case of Shared Key authentication, the value of the following property can be encrypted:
fs.azure.account.key.$$ACCOUNT_NAME$$.dfs.core.windows.net
- In the case of OAuth authentication, the values of the following properties can be encrypted:
- Copy the
core-site.xml
file to thebin
directory directly under the Incorta node (<INCORTA_HOME>/cmc/bin/
or<INCORTA_HOME>/IncortaNode/bin/
), and then navigate to thisbin
directory and run one of the following commands as appropriate:- For the CMC node:
python3 update_core_site_cmc.py --keystore
- For a service node:
pyhton3 update_core_site_incortaNode.py --keystore
NoteRunning the Python script with the
keystore
argument performs the following:- Creates the KeyStore.
- Modifies the
core-site.xml
file to mask the sensitive credentials specified in thekeystore.properties
file and add the new property,hadoop.security.credential.provider.path
that points to the KeyStore. - Distributes the modified
core-site.xml
file to the required paths according to the node type.
- For the CMC node:
- Restart Incorta services and CMC.
Additional Considerations
When using the update_core_site
script to create the KeyStore, securely store credentials, and distribute the core-site.xml
file, consider the following:
- Do not add the authentication type property to the
keystore.properties
file:- OAuth:
fs.azure.account.auth.type
- Shared Key:
fs.azure.account.auth.type.$$ACCOUNT_NAME$$.dfs.core.windows.net
- OAuth:
- After creating the
core-site.xml
file with the masked credentials, clear thekeystore.properties
file on all nodes to remove the plain credentials. - In the case of updating nodes that already have their
core-site.xml
files, running the Python script will create backups of thecore-site.xml
files before updating them with the masked credentials. These backup files also have the credentials in plain text. Make sure to delete these files. - Running the script without the
keystore
argument only distributes the originalcore-site.xml
file with the credentials in plain text. - Running the script with the
keystore
argument while theHADOOP_CREDSTORE_PASSWORD
environmental variable is not set or thekeystore.properties
file is empty throws an error.
Securing the core-site.xml using the Hadoop binaries
Follow these steps to create a KeyStore with encrypted credentials using Hadoop binaries, and reference the credentials in the site-core.xml
file.
Download the Hadoop binaries.
Navigate to the
bin
directory in the Hadoop installation folder.Run the required commands to create the KeyStore and add the sensitive credentials to it.
In the case of Shared Key authentication, run the following command after replacing
$$ACCOUNT_NAME$$
and$$SHARED_KEY$$
with the actual values and$$KEYSTORE_PATH$$
with the absolute path to create the KeyStore in../hadoop credential create fs.azure.account.key.$$ACCOUNT_NAME$$.dfs.core.windows.net -value $$SHARED_KEY$$ -provider jceks://file/$$KEYSTORE_PATH$$.jceksIn the case of OAuth authentication, run the following commands after replacing
$$TENANT_ID$$
,$$CLIENT_ID$$
, and$$CLIENT_SECRET$$
with your credentials and$$KEYSTORE_PATH$$
with the absolute path to create the KeyStore in../hadoop credential create fs.azure.account.oauth2.client.id -value $$CLIENT_ID$$ -provider jceks://file/$$KEYSTORE_PATH$$.jceks./hadoop credential create fs.azure.account.oauth2.client.secret -value $$CLIENT_SECRET$$ -provider jceks://file/$$KEYSTORE_PATH$$.jceks./hadoop credential create fs.azure.account.oauth2.client.endpoint -value $$ENDPOINT$$ -provider jceks://file/$$KEYSTORE_PATH$$.jceks
Edit the
core-site.xml
file to remove the properties related to the encrypted credentials and add a new property pointing to the previously created KeyStore.In the case of OAuth authentication, the
core-site.xml
file should be as follows:<configuration><property><name>hadoop.security.credential.provider.path</name><value>jceks://file/$$KEYSTORE_PATH$$.jceks</value></property><property><name>fs.azure.account.auth.type</name><value>OAuth</value></property><property><name>fs.azure.account.oauth.provider.type</name><value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value></property></configuration>In the case of Shared Key authentication, the
core-site.xml
file should be as follows:<configuration><property><name>hadoop.security.credential.provider.path</name><value>jceks://file/$$KEYSTORE_PATH$$.jceks</value></property><property><name>fs.azure.account.auth.type.$$ACCOUNT_NAME$$.dfs.core.windows.net</name><value>SharedKey</value></property></configuration>
Propagate the core-site.xml file
If you have secured the core-site.xml using Incorta, skip this section.
After creating the core-site.xml
, you must copy it to the following paths:
- On the CMC node:
<INCORTA_HOME>/cmc/bin
<INCORTA_HOME>/cmc/tmt/lib/
<INCORTA_HOME>/cmc/inspector/
- On the service node:
<INCORTA_HOME>/IncortaNode/bin
<INCORTA_HOME>/IncortaNode/spark/conf/
<INCORTA_HOME>/IncortaNode/runtime/lib/
<INCORTA_HOME>/IncortaNode/runtime/webapps/incorta/WEB-INF/lib/
<INCORTA_HOME>/IncortaNode/sqli/runtime/lib/
You can copy the core-site.xml
file manually or using the update_core_site
script.
Propagate the file via the update_core_site script
On the CMC node:
- Manually copy the
core-site.xml
to the<INCORTA_HOME>/cmc/bin
directory. - Navigate to the
<INCORTA_HOME>/cmc/bin
directory, and then run the following script:python3 update_core_site_cmc.py
On the service node:
- Manually copy the
core-site.xml
to the<INCORTA_HOME>/IncortaNode/bin
directory. - Navigate to the
<INCORTA_HOME>/IncortaNode/bin
directory, and then run the following script:python3 update_core_site_incortaNode.py
Create an ADLS Gen2 Tenant in the CMC
Following are the steps to create an ADLS Tenant in the CMC:
- Sign in to the CMC.
- In the navigation bar, select Clusters.
- In the clusters list, select a cluster name.
- In the canvas tabs, select Tenants.
- Select + > Create Tenant.
- Enter a Tenant Name, Username, Password, and Email.
- Enter the Shared Storage Path. For ADLS Gen2, the path will be:
abfs://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<DIRECTORY_PATH>
You will need to have read/write permission to the ADLS Gen2 path.
Whitelist the ADLS Gen2 endpoints
If the Incorta node is protected by a firewall or has limited internet access, whitelist the following ADLS Gen2 endpoints to ensure they are accessible:
*.dfs.core.windows.net
*.blob.core.windows.net