Tools → Data Agent
About a Data Agent
Rather than opening a VPN or SSH tunnel between your external database and your Incorta cluster, you can install and configure a Data Agent service to run on the database’s host or a host on the same subnet as the database host. Typically, the database host resides behind a corporate firewall or on another interdepartmental subnet.
A Data Agent implementation is comprised of:
- A data agent service that runs on a host.
- A data agent object in the Data Manager for a given tenant.
- An authentication file shared between the data agent object and the data agent service.
The Data Agent service supports the following data sources:
- Apache Drill
- Apache Hive
- Athena
- Cassandra (Simba)
- Custom SQL
- Data Lake Local Files
- IBM DB2
- MongoDB Bl
- MySQL
- Netezza
- NetSuite SuiteAnalytics
- Oracle
- PostgreSQL
- Presto
- RedShift
- SAP ERP
- SAP Hana
- SAP Sybase IQ
- SQL Server
- SQL Server (JTDS)
- Teradata
- Vertica
The data agent service enables the extraction of data from one or more data sources behind a firewall to an Incorta cluster. This means that you can have a single data agent that connects to multiple data sources in your organization. Your Incorta cluster can reside on-premises or in the cloud.
Requirements for the Data Agent service host
Here are the requirements to run a Data Agent service on a host:
- Minimum of 16G RAM and 4 CPU
- Ability to install Java or OpenJDK
- The host must not block outgoing connections
- The Incorta cluster must allow for two additional incoming ports
- MySQL driver
The connection between Incorta and a data agent service uses TLS/SSL. Authentication and requires a valid CA certificate or self-signed certificate. To learn more about TLS/SSL, refer to Security → HTTPS for Apache Tomcat with OpenSSL. The data agent encodes data for transfer using Google’s ProtoBuf library.
The data agent service connects to the Analytics Service and Loader Service through specific ports. The data agent service host must be able to accept incoming and outgoing communications from and to Incorta services.
Set up the Data Agent
Here are the high-level steps required to get the Data Agent working:
- Enable the Data Agent feature and get the installation package.
- Prepare the Data Agent Service host.
- Create a data agent object in the Data Manager and generate the authentication file.
- Copy the authentication file to the remote host.
- Start the Data Agent Service.
In the case of upgrading to a newer version of the Data Agent, you must stop the Data Agent service before the upgrade.
Enable the Data Agent feature and get the installation package
The steps required to enable the Data Agent and get the package vary according to the cluster installation: On-Premises or Cloud cloud.incorta.com.
Starting 2024.1.x, enabling or disabling the Data Agent feature or changing the service ports no longer requires restarting the cluster or any service.
Enable the Data Agent for an On-Premises Incorta cluster
For On-Premises, you must contact Incorta Support directly for the Data Agent package.
You must also set the Data Agent configurations manually in the Cluster Management Console (CMC).
- Sign in as the CMC Administrator.
- In the Clusters Manager, select the cluster.
- In the Cluster Manager, select Cluster Configurations.
- In Server Configurations, in the left panel, select Data Agent.
- Turn on the Enable Data Agent toggle.
- Specify the Data Agent properties as required:
- Analytics Data Agent Port
- Analytics Data Agent Controller Port (required starting 2024.7.x)
- Loader Data Agent Port
- SQLi Data Agent Port
- Analytics Public Hosts and Port
- Analytics Public Controller Hosts and Ports (required starting 2024.7.x)
- Loader Public Hosts Ports
- SQLi Public Hosts and Port
- Select Save.
Depending on your requirements, the host IP or DNS can be a public IP, public DNS, private IP, or private DNS.
Property | Description |
---|---|
Analytics Data Agent Port | The Analytics Service listens to a data agent service on this local port |
Analytics Data Agent Controller Port | The Analytics Service listens to a data agent controller service on this local port. |
Loader Data Agent Port | The Loader Service listens to a data agent service on this local port |
SQLi Data Agent Port | The SQL Interface Service listens to a data agent service on this local port. This option is available for On-Premises installations only. |
Analytics Public Hosts and Ports | The host IP or DNS and the port. The data agent service connects to the Analytics Service using this HOST:PORT . The connection is forwarded to the specified Analytics Data Agent Port. |
Analytics Public Controller Hosts and Ports | The host IP or DNS and the port. The data agent controller service connects to the Analytics Service using this HOST:PORT . The connection is forwarded to the specified Analytics Data Agent Controller Port. |
Loader Public Hosts and Ports | The host IP or DNS and the port. The data agent service connects to the Loader Service using this HOST:PORT . The connection is forwarded to the specified Loader Data Agent Port. |
SQLi Public Hosts and Ports | The host IP or host DNS and the port. The data agent service connects to the SQL Interface Service using this HOST:PORT . The connection is forwarded to the specified SQLi Data Agent Port. This option is available for On-Premises installations only. |
Enable and download the Data Agent for Incorta Cloud
For an Incorta Cloud cluster, you can enable and download the Data Agent from the Cloud Admin Portal.
- Sign in to your Cloud Admin Portal as the Cloud Administrator.
- In the Cloud Admin Portal, select the Incorta cluster. The cluster must be connected.
- In Cluster Details, select Configurations.
- If not previously enabled, enable the Data Agent by switching on the Enable Data Agent toggle.
- Select the Download Data Agent link.
When you enable the Data Agent from the Cloud Admin Portal, Incorta Cloud automatically enables and configures the Server Configurations for the cluster in the CMC.
Prepare the Data Agent host
Install Java or the OpenJDK
Before installing the data agent service on a host, you must first install Java or OpenJDK. The supported versions are:
- Oracle Java 8
- OpenJDK 8
- OpenJDK 11
You can download OpenJDK 11 from https://jdk.java.net/archive.
The host environment must have a JAVA_HOME
system environment variable with a value set to the OpenJDK directory. The PATH
environment variable must include:
JAVA_HOME/bin
for Linux%JAVA_HOME%\bin
for Windows
The Data Agent may not start normally on a Windows machine when the OpenJDK version is 11.0.1. The OpenJDK version must be upgraded to 11.0.2 or later.
Unzip and install the Data Agent Service
You can install the data agent service on a host machine that runs Windows or Linux.
On a Windows host
Here are the steps to install the data agent service on a Windows host:
- Copy the
incorta.dataagent-X.Y.Z.zip
file to the Windows host. - Unzip the
incorta.dataagent-X.Y.Z.zip
file to any local directory on the Windows host.
On a Linux host
Here are the steps to install the data agent service on a Linux host:
Secure copy the ZIP file to the Linux host. Here is an example:
HOST_IP=192.168.128.100HOST_KEY_FILE=private.pemHOST_USER=incortaDATA_AGENT_FILE=incorta.dataagent-1.1.0.zipcd ~/Downloadsscp -i ~/.ssh/${HOST_KEY_FILE} ${DATA_AGENT_FILE} ${HOST_USER}@${HOST_IP}:/tmpSecure shell into the Linux host and unzip the
incorta.dataagent-X.Y.Z.zip
file to any local directory.ssh -i ~/.ssh/${HOST_KEY_FILE} ${HOST_USER}@${HOST_IP}Unzip the ZIP file.
DATA_AGENT_FILE=incorta.dataagent-1.1.0.zipcd /tmpunzip ${DATA_AGENT_FILE}
Configure the data agent service properties
The default memory size for the data agent in releases before 2024.7.x is 2G while the default is 6G as of 2024.7.x. You can increase this by editing the options.properties
file located in the unzipped data agent directory.
Here are the steps to edit it on a Linux host:
Secure shell into the Linux host
HOST_IP=192.168.128.100HOST_KEY_FILE=private.pemHOST_USER=incortassh -i ~/.ssh/${HOST_KEY_FILE} ${HOST_USER}@${HOST_IP}Using VIM, or similar, edit the
options.properties
file.DATA_AGENT_PATH=/tmp/incorta.datagent/cd $DATA_AGENT_PATHvim options.propertiesModify the
memorySize
property (use thei
keystroke for Insert mode)memorySize=8GSave your changes to the file (use
esc
keystroke to return to Read mode, and the:wq!
keystroke to save).
Deploy the MySQL driver
The Data Agent requires the MYSQL driver, which will no longer be included in the Data Agent package starting 2024.1.x. You can download the MySQL jar version 5.1.48 from the Maven repository and copy it to the required directories:
- For releases before 24.7.0, you must copy the MySQL jar to
<unzipped_data_agent_path>/lib
. - Starting 2024.7.x, you must copy the MySQL jar to the following directories:
<unzipped_data_agent_path>/incorta.dataagent/lib
<unzipped_data_agent_path>/incorta.dataagent.controller/lib
Releases 2024.1.4 and 2024.7.x have introduced two scripts that help you have the previous steps automated. From the unzipped incorta.dataagent
directory, run one of the following scripts depending on the OS of the machine you install the Data Agent on:
- For Windows, run
patch-mysql.bat <unzipped_data_agent_path>
. - For Linux, run
./patch-mysql.sh <unzipped_data_agent_path>
.
These scripts download the MySQL jar file version 5.1.48 from the Maven repository and mainly deploy it to the required directories.
Check <unzipped_data_agent_path>/patch-mysql.log
to inspect the script's output.
If you already have the MySQL driver downloaded, you can use the script to only copy the jar file to the required directories. Add the jar location to the script as follows:
- For Windows, run
patch-mysql.bat <unzipped_data_agent_path> <mysql_jar_location>
. - For Linux, run
./patch-mysql.sh <unzipped_data_agent_path> <mysql_jar_location>
.
Create a data agent in the Data Manager
You create a data agent in the Data Manager to authenticate and monitor a remote data agent service. Only a Tenant Administrator (Super User) or user that belongs to a group with the SuperRole role for a given tenant can create a data agent instance that connects to a data agent service. A user that belongs to a group with the Schema Manager role can view the list of data agents in the Data Manager.
When you create a data agent in the Data Manager, you can generate and download an encrypted authentication file. The data agent service on the remote, on-premises host requires the generated .auth file.
Here are the steps to create a data agent in the Data Manager and generate the authentication file:
- Sign in to the tenant as the Tenant Administrator (Super User) or user that belongs to a group with the SuperRole role.
- In the Navigation bar, select Data.
- In the Action bar, select + New > Add Data Agent.
- In the Create Data Agent dialog, enter a Data Agent Name and optionally enter a description.
- In the Generate Authentication File dialog, select Generate Now.
You can regenerate an authentication file from the Data Manager for a given data agent. The file contains all the information the data agent service needs to connect to the Incorta cluster. The file includes information regarding the various hosts, ports, and TLS/SSL certificates.
Copy the authentication file to the remote host
You must then copy the .auth
file to the conf
directory of the data agent service installation directory.
Starting 2024.7.x, you must copy the .auth
file to the following directories:
<unzipped_data_agent_path>/incorta.dataagent/conf
<unzipped_data_agent_path>/incorta.dataagent.controller/conf
To copy the authentication file to the remote host:
- Secure copy or upload the .auth file to the Linux or Windows remote host.
- Move the
.auth
file to the conf the required directories depending on your Incorta release.
Start the data agent service
The steps required to start the data agent service vary according to the Incorta release. For releases before 2024.7.x, you start the data agent service via a script. Starting 2024.7.x, you start the Data Agent controller via a script, and then use the Data Manager in the Analytics platform to start, stop, or restart a data agent service.
Start the data agent service via a script for a Windows host (before 2024.7.x)
It is recommended that you use a service helper utility to monitor the data agent’s state and automatically restart it if it goes down. Here are the steps:
- Install a service helper utility if you do not already have one.
- Create a new service for the data agent and provide the full path to the agent.bat file.
- Start the service.
- Sign out from the host and sign back in to test that the service is still running.
- View the events for the service in the Event Viewer.
Start the data agent service via a script for a Linux host (before 2024.7.x)
- Secure shell into the remote host.
- Navigate to the installation directory of the data agent service.
- Run
./agent.sh start
.
Before upgrading to a newer data agent version, you must stop the data agent service first: Run the ./agent.sh stop
command.
Confirm data agent service connection in the Data Manager
After starting the data agent, you can check its status in the Data Manager.
- Sign in to the tenant as the Tenant Administrator (Super User) or user that belongs to a group with the SuperRole role.
- In the Navigation bar, select Data.
- In the Action bar tab, select Data Agents.
- Verify the status of the data agent as connected
Start the data agent via the Analytics platform (starting 2024.7.x)
To use the Data Manager to start or stop the data agent service on the remote host, you must start the Data Agent Controller.
Here are the required steps to start the Controller:
- On the remote host, navigate to the Data Agent’s unzipped directory.
- In the
incorta.dataagent.controller
directory, run one of the following scripts depending on the host machine’s OS:- Linux:
./bin/controller.sh start
- Windows:
bin/controller.bat start
- Linux:
You can also use a service helper utility on a Windows host to monitor the state of the Data Agent Controller and automatically restart it if it goes down.
After restarting the Controller, you can use the Data Manager to start the data agent. Here are the steps:
- Sign in to the tenant as the Tenant Administrator (Super User) or user that belongs to a group with the SuperRole role.
- In the Navigation bar, select Data.
- In the Action bar tab, select Data Agents.
- For the data agent you have created, select Start.
Create or edit an external data source using the data agent
You can now create a new or edit an existing external data source using the data agent. To learn more about how to create and edit an external data source, review Tools → Data Manager.
- In the Create Data Source or Edit Data Source dialog, enable the Use Data Agent toggle.
- For the Data Agent property, in the dropdown list, select the data agent.
- Specify a connection string that is accessible to the host of the data agent service that includes the Private IP or Private DNS such as
127.0.0.1
for a local host or192.168.128.100
(replace as required) for a database that is on the same subnet as the data agent host.