Guides → Configure a Tenant on ADLS Gen2

Context

This content applies to On-Premises installations of releases starting 2024.1.x.

You can configure an Incorta Tenant on Azure Data Lake Storage (ADLS) Gen2, using it as a file system to save files, including Parquet and snapshot files. Here are the steps:

Create a core-site.xml file

Create a file and name it core-site.xml. You will need to add your ADLS Gen2 credentials to the file. After creating the file, you can securely store the credentials using a Java KeyStore to avoid having them in plain text in the core-site.xml.Then, you need to add the file to the required locations.

The content of the core-site.xml file varies according to the authentication type you use: OAuth or Shared Key.

OAuth authentication

The following is the content of the core-site.xml file in the case of OAuth authentication:

<configuration>
<!-- other configuration -->
<property>
<name>fs.azure.account.auth.type</name>
<value>OAuth</value>
</property>
<property>
<name>fs.azure.account.oauth.provider.type</name>
<value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value>
</property>
<property>
<name>fs.azure.account.oauth2.client.endpoint</name>
<value>https://login.microsoftonline.com/$$TENANT_ID$$/oauth2/token</value>
</property>
<property>
<name>fs.azure.account.oauth2.client.id</name>
<value>$$CLIENT_ID$$</value>
</property>
<property>
<name>fs.azure.account.oauth2.client.secret</name>
<value>$$CLIENT_SECRET$$</value>
</property>
<!-- other configuration -->
</configuration>
Note

Replace $$TENANT_ID$$, $$CLIENT_ID$$, and $$CLIENT_SECRET$$ with your ADLS Gen2 credentials. You can get these from the Azure web console.

For more details, refer Register an application in Microsoft Entra ID.

Shared Key authentication

The following is the content of the core-site.xml file in the case of Shared Key authentication:

<configuration>
<!-- other configuration -->
<property>
<name>fs.azure.account.auth.type.$$ACCOUNT_NAME$$.dfs.core.windows.net</name>
<value>SharedKey</value>
</property>
<property>
<name>fs.azure.account.key.$$ACCOUNT_NAME$$.dfs.core.windows.net</name>
<value>$$SHARED_KEY$$</value>
</property>
<!-- other configuration -->
</configuration>
Note

Replace $$ACCOUNT_NAME$$ and $$SHARED_KEY$$ with your Azure storage account name and the associated access key.

For more details, refer to Manage storage account access keys.

Secure the core-site.xml sensitive data

Azure credentials and account details, including the client ID and secret or the Shared Key (storage account key), can be added to the core-site.xml file in plain text or stored securely in a Java KeyStore.

You can secure the cote-site.xml credentials using Incorta scripts or using the Hadoop binaries. Incorta scripts automate a few steps and do not require installing the Hadoop binaries on the node.

Securing the core-site.xml using Incorta

This method is available starting with 2024.7.4, enabling the storage of the sensitive data in a Java KeyStore. The KeyStore will be stored locally on the host machine where the Incorta node, whether a service node or the CMC node, is installed.

Here are the steps required to be performed on each Incorta node:

  1. Set the HADOOP_CREDSTORE_PASSWORD environment variable to a value representing the KeyStore password on the host machine. The steps vary depending on the operating system. The KeyStore password must be at least 4 characters long; however, it is recommended to use a strong password.
  2. Navigate to the keystore.properties file:
    • For the CMC node: <INCORTA_HOME>/cmc/bin/keystore-cli/keystore.properties.
    • For a service node: <INCORTA_HOME>/IncortaNode/bin/keystore-cli/keystore.properties.
  3. Populate the keystore.properties file with the aliases and keys that you want to protect: the alias is the property name within the core-site.xml file and the corresponding key should be the value of the same property. Multiple aliases and keys can be added in a comma-separated format. Example:
    aliases=fs.azure.account.oauth2.client.id,fs.azure.account.oauth2.client.secret
    keys=Abc123-Def456-Ghi789,Xyz~098stu765
    Note
    • The number of aliases must match the number of keys.
    • The values that can be encrypted vary according to the authentication type: Shared Key or OAuth.
      • In the case of OAuth authentication, the values of the following properties can be encrypted:
        • fs.azure.account.oauth2.client.endpoint
        • fs.azure.account.oauth2.client.id
        • fs.azure.account.oauth2.client.secret
      • In the case of Shared Key authentication, the value of the following property can be encrypted:
        • fs.azure.account.key.$$ACCOUNT_NAME$$.dfs.core.windows.net
  4. Copy the core-site.xml file to the bin directory directly under the Incorta node (<INCORTA_HOME>/cmc/bin/ or <INCORTA_HOME>/IncortaNode/bin/), and then navigate to this bin directory and run one of the following commands as appropriate:
    • For the CMC node: python3 update_core_site_cmc.py --keystore
    • For a service node: pyhton3 update_core_site_incortaNode.py --keystore
      Note

      Running the Python script with the keystore argument performs the following:

      • Creates the KeyStore.
      • Modifies the core-site.xml file to mask the sensitive credentials specified in the keystore.properties file and add the new property, hadoop.security.credential.provider.path that points to the KeyStore.
      • Distributes the modified core-site.xml file to the required paths according to the node type.
  5. Restart Incorta services and CMC.

Additional Considerations

When using the update_core_site script to create the KeyStore, securely store credentials, and distribute the core-site.xml file, consider the following:

  • Do not add the authentication type property to the keystore.properties file:
    • OAuth: fs.azure.account.auth.type
    • Shared Key: fs.azure.account.auth.type.$$ACCOUNT_NAME$$.dfs.core.windows.net
  • After creating the core-site.xml file with the masked credentials, clear the keystore.properties file on all nodes to remove the plain credentials.
  • In the case of updating nodes that already have their core-site.xml files, running the Python script will create backups of the core-site.xml files before updating them with the masked credentials. These backup files also have the credentials in plain text. Make sure to delete these files.
  • Running the script without the keystore argument only distributes the original core-site.xml file with the credentials in plain text.
  • Running the script with the keystore argument while the HADOOP_CREDSTORE_PASSWORD environmental variable is not set or the keystore.properties file is empty throws an error.

Securing the core-site.xml using the Hadoop binaries

Follow these steps to create a KeyStore with encrypted credentials using Hadoop binaries, and reference the credentials in the site-core.xml file.

  1. Download the Hadoop binaries.

  2. Navigate to the bin directory in the Hadoop installation folder.

  3. Run the required commands to create the KeyStore and add the sensitive credentials to it.

    • In the case of Shared Key authentication, run the following command after replacing $$ACCOUNT_NAME$$ and $$SHARED_KEY$$ with the actual values and $$KEYSTORE_PATH$$ with the absolute path to create the KeyStore in.

      ./hadoop credential create fs.azure.account.key.$$ACCOUNT_NAME$$.dfs.core.windows.net -value $$SHARED_KEY$$ -provider jceks://file/$$KEYSTORE_PATH$$.jceks
    • In the case of OAuth authentication, run the following commands after replacing $$TENANT_ID$$, $$CLIENT_ID$$, and $$CLIENT_SECRET$$ with your credentials and $$KEYSTORE_PATH$$ with the absolute path to create the KeyStore in.

      ./hadoop credential create fs.azure.account.oauth2.client.id -value $$CLIENT_ID$$ -provider jceks://file/$$KEYSTORE_PATH$$.jceks
      ./hadoop credential create fs.azure.account.oauth2.client.secret -value $$CLIENT_SECRET$$ -provider jceks://file/$$KEYSTORE_PATH$$.jceks
      ./hadoop credential create fs.azure.account.oauth2.client.endpoint -value $$ENDPOINT$$ -provider jceks://file/$$KEYSTORE_PATH$$.jceks
  4. Edit the core-site.xml file to remove the properties related to the encrypted credentials and add a new property pointing to the previously created KeyStore.

    • In the case of OAuth authentication, the core-site.xml file should be as follows:

      <configuration>
      <property>
      <name>hadoop.security.credential.provider.path</name>
      <value>jceks://file/$$KEYSTORE_PATH$$.jceks</value>
      </property>
      <property>
      <name>fs.azure.account.auth.type</name>
      <value>OAuth</value>
      </property>
      <property>
      <name>fs.azure.account.oauth.provider.type</name>
      <value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value>
      </property>
      </configuration>
    • In the case of Shared Key authentication, the core-site.xml file should be as follows:

      <configuration>
      <property>
      <name>hadoop.security.credential.provider.path</name>
      <value>jceks://file/$$KEYSTORE_PATH$$.jceks</value>
      </property>
      <property>
      <name>fs.azure.account.auth.type.$$ACCOUNT_NAME$$.dfs.core.windows.net</name>
      <value>SharedKey</value>
      </property>
      </configuration>

Propagate the core-site.xml file

Note

If you have secured the core-site.xml using Incorta, skip this section.

After creating the core-site.xml, you must copy it to the following paths:

  • On the CMC node:
    • <INCORTA_HOME>/cmc/bin
    • <INCORTA_HOME>/cmc/tmt/lib/
    • <INCORTA_HOME>/cmc/inspector/
  • On the service node:
    • <INCORTA_HOME>/IncortaNode/bin
    • <INCORTA_HOME>/IncortaNode/spark/conf/
    • <INCORTA_HOME>/IncortaNode/runtime/lib/
    • <INCORTA_HOME>/IncortaNode/runtime/webapps/incorta/WEB-INF/lib/
    • <INCORTA_HOME>/IncortaNode/sqli/runtime/lib/

You can copy the core-site.xml file manually or using the update_core_site script.

Propagate the file via the update_core_site script

On the CMC node:

  1. Manually copy the core-site.xml to the <INCORTA_HOME>/cmc/bin directory.
  2. Navigate to the <INCORTA_HOME>/cmc/bin directory, and then run the following script:
    python3 update_core_site_cmc.py

On the service node:

  1. Manually copy the core-site.xml to the <INCORTA_HOME>/IncortaNode/bin directory.
  2. Navigate to the <INCORTA_HOME>/IncortaNode/bin directory, and then run the following script:
    python3 update_core_site_incortaNode.py

Create an ADLS Gen2 Tenant in the CMC

Following are the steps to create an ADLS Tenant in the CMC:

  • Sign in to the CMC.
  • In the navigation bar, select Clusters.
  • In the clusters list, select a cluster name.
  • In the canvas tabs, select Tenants.
  • Select + > Create Tenant.
    • Enter a Tenant Name, Username, Password, and Email.
    • Enter the Shared Storage Path. For ADLS Gen2, the path will be: abfs://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<DIRECTORY_PATH>
Note

You will need to have read/write permission to the ADLS Gen2 path.

Whitelist the ADLS Gen2 endpoints

If the Incorta node is protected by a firewall or has limited internet access, whitelist the following ADLS Gen2 endpoints to ensure they are accessible:

*.dfs.core.windows.net

*.blob.core.windows.net