Tools → Data Agent
About a Data Agent
Rather than opening a VPN or SSH tunnel between your external database and your Incorta cluster, you can install and configure a Data Agent service to run on the host of the database or host that is on the same subnet as a the database host. Typically, the database host resides behind a corporate firewall or on another interdepartmental subnet.
The Data Agent service supports the following data sources:
- Apache Drill
- Apache Hive
- Athena
- Cassandra (Simba)
- Custom SQL
- Data Lake Local Files
- IBM DB2
- MongoDB Bl
- MySQL
- Netezza
- NetSuite SuiteAnalytics
- Oracle
- PostgreSQL
- Presto
- RedShift
- SAP ERP
- SAP Hana
- SAP Sybase IQ
- SQL Server
- SQL Server (JTDS)
- Teradata
- Vertica
The data agent service enables the extraction of data from one or more databases behind a firewall to an Incorta cluster. This means that you can have a single data agent that connects to multiple data sources in you organization. Your Incorta cluster can reside on-premises or in the cloud.
The connection between Incorta and a data agent service uses TLS/SSL. Authentication requires a valid CA certificate or self-signed certificate. To learn more about TLS/SSL, please review Security → HTTPS for Apache Tomcat with OpenSSL. The data agent encodes data for transfer using the Google’s ProtoBuf library.
A CMC Administrator must enable and configure an Incorta cluster to support the use of Data Agents. Only a Tenant Administrator (Super User) or user that belongs to a group with the SuperRole role for a given tenant can create a data agent that connects to a data agent service. For a given data agent, the Tenant Administrator or similar user must generate an authentication file. A system administrator then copies the generated authentication file to the conf
directory of the data agent service installation on the remote host. With the authentication file in place, a system administrator starts the data agent service on the remote host. A Tenant Administrator or similar user then must confirm the connected status of the Data Agent service in the Data Manager. Once connected, a user that belongs to a group with the Schema Manager or SuperRole can can create an external data source using the required database connector and the connected data agent.
Requirements for the Data Agent service host
Here are the requirements to run a Data Agent service on a host:
- Minimum of 16G RAM and 4 CPU
- Ability to install Java or OpenJDK
- The host must not block outgoing connections
- The Incorta cluster must allow for two additional incoming ports
There are two configurations available for the Data Agent:
- on-premises
- cloud.incorta.com
You can download the Data Agent from the Cloud Console for an Incorta cluster. For on-premises, you must contact Incorta Support directly for the download file.
Enable the Data Agent for a cloud.incorta.com Incorta cluster
When you enable the Data Agent from the Cloud Console, cloud.incorta.com automatically enables and configures the server configurations for the cluster in the Cluster Management Console.
- Sign in to your cloud console at http://cloud.incorta.com as the Cloud Administrator.
- In the cloud console, select an Incorta cluster.
- In Cluster Details, enable the Data Agent.
- In Data Agent, select download.
Enable the Data Agent for an on-premises Incorta cluster
A data agent service connects to the Analytics Service and Loader Service through specific ports. The data agent service host must be able to receive incoming and outgoing communications over the specified ports.
After you enable a Data Agent feature, you must restart the Incorta cluster. Changes to individual Agent Ports require restarting the related Analytics or Loader Services.
- Sign in as the CMC Administrator
- In the Clusters Manager, select the cluster.
- In the Cluster Manager, select Cluster Configurations.
- In Server Configurations, in the left panel, select Data Agent.
- Toggle on the Enable Data Agent property.
- Specify the Data Agent properties:
- Analytics Data Agent Port
- Loader Data Agent Port
- Analytics Public Hosts and Port
- Loader Public Hosts Ports
- Select Save.
Depending on your requirements, the HOST_IP
or HOST_DNS
can be a PUBLIC_IP
, PUBLIC_DNS
, PRIVATE_IP
, or PRIVATE_DNS
.
Property | Description |
---|---|
Analytics Data Agent Port | The Analytics Service listens to a data agent service on this local port |
Loader Data Agent Port | The Loader Service listens to a data agent service on this local port |
Analytics Public Hosts and Ports | The HOST IP or HOST DNS and the port. The data agent service connects to the Analytics Service using this HOST:PORT . The connection is forwarded to the specified to the Analytics Data Agent Port |
Loader Public Hosts and Ports | The HOST IP or HOST DNS and the port. The data agent service connects to the Loader Service using this HOST:PORT . The connection is forwarded to the specified to the Loader Data Agent Port . |
For an on-premises installation of the Data Agent, you must contact Incorta Support directly for the download binary.
Install Java or the OpenJDK
Before installing the data agent service on a host, you must first install Java. The supported versions of JAVA are:
- Oracle Java 8
- OpenJDK 8
- OpenJDK 11
You can download OpenJDK 11 from https://jdk.java.net/archive/
The host environment must have a JAVA_HOME system environment variable with a value set to the OpenJDK directory. The PATH environment variable must include:
JAVA_HOME/bin
for Linux%JAVA_HOME%\bin
for Windows
Install the data agent service on a Windows host
Here are the steps to install the data agent service for a Windows host:
- Copy the
incorta.dataagent-X.Y.Z.zip
download file to the Windows host. - Unzip the
incorta.dataagent-X.Y.Z.zip
file to any local directory on the Windows host.
Install the data agent service on a Linux host
Here are the steps to install the data agent service for a Linux host:
- Secure copy the download file to the Linux host. Here is an example:
HOST_IP=192.168.128.100HOST_KEY_FILE=private.pemHOST_USER=incortaDATA_AGENT_FILE=incorta.dataagent-1.1.0.zipcd ~/Downloadsscp -i ~/.ssh/${HOST_KEY_FILE} ${DATA_AGENT_FILE} ${HOST_USER}@${HOST_IP}:/tmp
- Secure shell into the Linux host and unzip the
incorta.dataagent-X.Y.Z.zip
file to any local directory.
ssh -i ~/.ssh/${HOST_KEY_FILE} ${HOST_USER}@${HOST_IP}
- Unzip the ZIP file.
DATA_AGENT_FILE=incorta.dataagent-1.1.0.zipcd /tmpunzip ${DATA_AGENT_FILE}
Configure data agent service properties
The default memory size for the data agent is 2G. You can increase this to a higher amount, such as 4G of memory. Here are the steps:
- Secure shell into the Linux host
HOST_IP=192.168.128.100HOST_KEY_FILE=private.pemHOST_USER=incortassh -i ~/.ssh/${HOST_KEY_FILE} ${HOST_USER}@${HOST_IP}
- Using VIM, or similar, edit the
options.properties
file.
DATA_AGENT_PATH=/tmp/incorta.datagent/cd $DATA_AGENT_PATHvim options.properties
- Modify the
memorySize
property (use thei
keystroke for Insert mode)
memorySize=4G
- Save your changes to the file (use
esc
keystroke to return to Read mode, and the:wq!
keystroke to save).
Create a data agent in the Data Manager
You create a data agent in the Data Manager to authenticate and monitor a remote data agent service. Only a Tenant Administrator (Super User) or user that belongs to a group with the SuperRole role for a given tenant can create a data agent instance that connects to a data agent service. A user that belongs to a group with the Schema Manager role can see a list of data agents in the Data Manager.
When you create a data agent in the Data Manager, you can generate and download an encrypted authentication file. The data agent service on the remote, on-premises host requires the generated .auth
file. You must then copy the .auth
file to the conf
directory of the data agent service installation.
Here are the steps to create a data agent in the Data Manager:
- Sign in to the tenant as the Tenant Administrator (Super User) or user that belongs to a group with the SuperRole role.
- In the Navigation bar, select Data.
- In the Action bar, select + New > Add Data Agent.
- In the Create Data Agent dialog, enter a Data Agent Name and optionally enter a description.
- In the Generate Authentication File, select Generate Now.
You can regenerate an authentication file from the Data Manager for a given data agent. The file contains all the information needed by the data agent service to connect to the Incorta cluster. The file includes information regarding the various hosts, ports, and TLS/SSL certificates.
Copy the authentication file to the remote host
- Secure copy or upload the
.auth
file to the remote Linux or Windows remote host. - Move the
.auth
file to theconf
directory of the data agent service installation.
Start the data agent service for a Windows host
It is recommended that you use a service helper utility to monitor the data agent’s state and automatically restart it if it goes down. Here are the steps:
- Install a service helper utility if you do not already have one.
- Create a new service for the data agent and provide the full path to the
agent.bat
file. - Start the service.
- Log out of the host and log back in to test that the service is still running.
- View the events for the service in the Event Viewer.
Start the data agent service for a Linux host
- Secure shell in to the remote host.
- Navigate to the installation directory of the data agent service.
- Run
./agent.sh start
Confirm data agent service connection in the Data Manager
- Sign in to the tenant as the Tenant Administrator (Super User) or user that belongs to a group with the SuperRole role.
- In the Navigation bar, select Data.
- In the Action bar tab, select Data Agents.
- Verify the status of the data agent as connected.
Create or edit external data source using the data agent
You can now create a new or edit an existing external data source using the data agent. To learn more about how to create and edit an external data source, please review Tools → Data Manager.
- In the Create or Edit Data Source dialog, enable the Use Data Agent toggle.
- For the Data Agent property, in the drop down list, select the data agent.
- Specify a connection string that is accessible to the host of the data agent service that includes the Private IP or Private DNS such as
127.0.0.1
for a local host or192.168.128.100
(replace as required) for a database that is on the same subnet as the data agent host.