AWS High Availability Incorta Cluster Guide
Guide: Install, Configure, and Deploy a High Availability Incorta Cluster in AWS
This guide describes how to install, configure, and deploy a high availability Incorta Cluster in the Amazon Web Services (AWS) cloud. The Cluster will support an active-active cluster typology. The Cluster will be public facing.
The Cluster typology in this guide includes 5 EC2 hosts for Incorta, 2 EC2 hosts for Apache ZooKeeper, an Elastic File Share (EFS) mount for each Incorta Node host, and Load Balancers that provide access to the Analytics Services through DNS addresses.
The 5 EC2 hosts for Incorta consists of the following applications and services:
- 1 host for:
- Incorta Cluster Management Console (CMC)
- Apache Spark (2.4.7, Incorta Node build)
- MySQL 8
- Apache ZooKeeper 3.6.1
- 4 hosts for Incorta Nodes:
- 1 host for an Incorta Node that runs a Loader Service, LoaderService_1
- 1 host for an Incorta Node that runs a Loader Service, LoaderService_2
- 1 host for an Incorta Node that runs an Analytics Service, AnalytcisService_1
- 1 host for an Incorta Node that runs an Analytics Service, AnalytcisService_2
The 2 EC2 hosts for Apache ZooKeeper consists of the following applications and services:
- 1 host for a Apache ZooKeeper 3.6.1
- 1 host for a Apache ZooKeeper 3.6.1
The Incorta specific portion of the procedure begins at Install and Start the Cluster Management Console.
AWS Prerequisites
For the selected AWS Region, the prerequisite configurations include the following:
- A Virtual Private Cloud (VPC) with a
- Subnet associated with your VPC and a specific Availability Zone
- A Route Table with Internet Gateway and Network ACL for the Subnet
- A Security Group with defined inbound and outbound rules for NFS, TCP, and HTTP traffic
- An IAM Role with an attached policy for AmazonElasticFileSystemFullAccess
- An Elastic File Share for the VPC with the related Subnet and Security Group that results in a File System ID and DNS Name
- 7 EC2 hosts running in the VPC with the same Subnet, Placement Group, IAM Role, Security Group, and the same Key Pair
.pem
file - 5 of the 7 EC2 hosts with the minimum configuration (m5a.xlarge)
- Amazon Linux 2 AMI x86
- 4 vCPUs
- 16 GiB memory
- Up to 5 GiB network
- 30 GiB size storage
- 2 of the 7 EC2 hosts running with the minimum configuration (t2.large)
- Amazon Linux 2 AMI x86
- 2 vCPUs
- 8 GiB memory
- Low to moderate network
- 20 GiB size storage
- A Classic Load Balancer in the same VPC with the same Availability Zone Subnet that:
- Is internet facing
- Supports HTTP on port 8080 with Load Balancer Stickiness on port 8080
- Specifies the 2 EC2 instances that individually host an Incorta Node which runs the Analytics Service
- Specifies a Health Check over TCP on port 8080
- A Network Load Balancer in the same VPC with the same Availability Zone Subnet that:
- Is internet facing
- Supports TCP on port 5436 and 5442
- Specifies a Target Group for the 2 EC2 instances that individually host an Incorta Node which runs the Analytics Service and has a configured Health check for TCP
AWS Host Access
This guide assumes that you can readily access all 7 EC2 hosts using a bash shell terminal that supports both Secure Shell (SSH) and Secure Copy File (SCP) using the shared Key Pair PEM file.
Incorta Installation Zip
This guide assumes that you have already downloaded the incorta-package-<version>.zip file and can securely copy this file using SCP to your 5 EC2 Hosts that will run either an Incorta Node or the Incorta Cluster Management Console.
EC2 Host Configuration Information
The relevant information for each EC2 host used as an example is tabulated here. This information will be used to complete the creation of the cluster.
EC2 Hosts, Nodes, Users, Applications, and Services
EC2_Host | Node Name | Linux User/Applications/Services |
Host_1 | CMC_MySQL_Spark_ZooKeeper_1 | ● incorta user ● OpenJDK 11 ● EFS Mount and Directory ● Apache ZooKeeper ● Incorta Cluster Management Console ● MySQL 8 ● Apache Spark |
Host_2 | IncortaNodeLoader_1 | ● incorta user ● OpenJDK 11 ● EFS Mount and Directory ● Incorta Node ● Incorta Loader Service 1 |
Host_3 | IncortaNodeLoader_2 | ● incorta user ● OpenJDK 11 ● EFS Mount and Directory ● Incorta Node ● Incorta Loader Service 2 |
Host_4 | IncortaNodeAnalytics_1 | ● incorta user ● OpenJDK 11 ● EFS Mount and Directory ● Incorta Node Incorta ● Analytics Service 1 |
Host_5 | IncortaNodeAnalytics_2 | ● incorta user ● OpenJDK 11 ● EFS Mount and Directory ● Incorta Node ● Incorta Analytics Service 2 |
Host_6 | ZooKeeper_2 | ● incorta user ● OpenJDK 11 ● Apache ZooKeeper |
Host_7 | ZooKeeper_3 | ● incorta user ● OpenJDK 11 ● Apache ZooKeeper |
EC2 Host IP and DNS Addresses
Each EC2 Host in this public facing cluster has a:
- Private DNS
- Private IP
- Public DNS (IPv4)
- IPv4 Public IP
In this document, the following IP address placeholders are used for the seven hosts. You will need to tabulate the actual IP addresses for your specific environment.
EC2 Host | Private DNS | Private IP | Public DNS (IPv4) | IPv4 Public IP |
Host_1 | <HOST_1_Private_DNS> | <HOST_1_Private_IP> | <HOST_1_Public_DNS_IPv4> | <HOST_1_IPv4_Public_IP> |
Host_2 | <HOST_2_Private_DNS> | <HOST_2_Private_IP> | <HOST_2_Public_DNS_IPv4> | <HOST_2_IPv4_Public_IP> |
Host_3 | <HOST_3_Private_DNS> | <HOST_3_Private_IP> | <HOST_3_Public_DNS_IPv4> | <HOST_3_IPv4_Public_IP> |
Host_4 | <HOST_4_Private_DNS> | <HOST_4_Private_IP> | <HOST_4_Public_DNS_IPv4> | <HOST_4_IPv4_Public_IP> |
Host_5 | <HOST_5_Private_DNS> | <HOST_5_Private_IP> | <HOST_5_Public_DNS_IPv4> | <HOST_5_IPv4_Public_IP> |
Host_6 | <HOST_6_Private_DNS> | <HOST_6_Private_IP> | <HOST_6_Public_DNS_IPv4> | <HOST_6_IPv4_Public_IP> |
Host_7 | <HOST_7_Private_DNS> | <HOST_7_Private_IP> | <HOST_7_Public_DNS_IPv4> | <HOST_7_IPv4_Public_IP> |
Shared Storage (Network File Sharing)
In this Incorta Cluster, the 5 EC2 Hosts that run Incorta need to be able to access Shared Storage using Amazon Elastic File System (EFS). EFS sharing is set up by mounting a designated disk partition on all of the hosts requiring access to the shared data. EFS designates a shared device as an EFS ID and EFS directory pair. In this document, the following placeholders are used:
EFS identifier : <efs-ID>EFS directory : <efs-shared-dir>
Linux Users
This guide references 3 Linux Users for bash shell commands: ec2-user, root, incorta
ec2-user
The ec2-user is a standard, unprivileged user available to the EC2 Amazon Linux 2 AMI host.
root
The root user is a privileged user available to the EC2 Amazon Linux 2 AMI host. In this document, the root user installs and runs some applications and services. This is not a requirement.
incorta
The incorta user is a standard, unprivileged user that you create. The incorta user will install certain applications and services and will own certain directories.
Summary of Procedures
All hosts require the procedure for updating the EC2 packages, creating the incorta Linux user, and installing and configuring Java OpenJDK 11.
For Host_1, Host_6, and Host_7, you will install and configure an Apache ZooKeeper ensemble.
For Host_1, Host_2, Host_3, Host_4 and Host_5, you will create the IncortaAnalytics directory and create an EFS mount for Shared Storage.
For Host_1, you will install MySQL 8 and create the Incorta Metadata database. In addition, you will install the Incorta Cluster Management Console (CMC), and install and configure the Incorta supplied version of Apache Spark.
For Host_2, Host_3, Host_4 and Host_5, you will install an Incorta HA Node.
Using the Cluster Management Console, you will create a cluster, federate nodes, and install either a Loader or Analytics service on each federated node. You will then start the Incorta Cluster. Next, you will create and configure an example tenant with sample data. You will then access the tenant and perform a full data load for a given schema. After successfully loading the data for the schema, you will view a Dashboard based on that data. Successfully viewing a Dashboard indicates your Incorta Cluster is operational.
You will verify SQLi connectivity to Incorta. You will verify SQLi connectivity to configure a tool to access the tenant through one and then the other Analytics Service.
With your Incorta Cluster verified to be running, you will access Incorta through a public DNS for a Network Load Balancer over TCP. You will follow steps to stop an Incorta service, confirm access to the cluster, and then restart the Incorta service.
Secure Shell
For Secure Shell access to EC2 hosts, AWS asks that you create a Key Pair and download a .pem
file. PEM (Privacy Enhanced Mail) is a base64 container format for encoding keys and certificates.
This guide assumes that you have a .pem
file installed under the ~/.ssh/
directory for Mac OS or Linux.
If using a Windows SSH client such as PuTTY, you will need to convert the .pem
format into a .ppk
format (PuTTY Private Key) using PuTTyGen.
Create Shell Variables for Mac OS or Linux
In PuTTY or another Windows SSH client, you can create and save a SSH connection for each of the seven EC2 Hosts.
To expedite secure shell access from either Mac OS or Linux and the EC2 Hosts, create the following variables to store the IPv4_Public_IP values for each host.
To begin, open Terminal and define the following variables using the IPv4_Public_IP values for the EC2 Hosts. The Public IPs in the following are for illustration.
HOST_1=34.155.100.1HOST_2=34.155.100.2HOST_3=34.155.100.3HOST_4=34.155.100.4HOST_5=34.155.100.5HOST_6=34.155.100.6HOST_7=34.155.100.7
Next confirm that you can log in to Host_1 as the ec2-user using the shell variable.
ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_1}
When prompted to access the host, enter Yes and press return.
Then, after successfully connecting to the EC2 Host, exit your terminal:
exit
Repeat for Host_2, Host_3, Host_4, Host_5, Host_6, and Host_7:
ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_2}exitssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_3}exitssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_4}exitssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_5}exitssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_6}exitssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_7}exit
Create the Incorta Group and User
The purpose of this section is to create a user and group for running and managing the Incorta software. In this document, the user and group are both called incorta. This is just an example. You should create a user and group name that matches your own needs.
You should also have the secure shell authentication file installed under the ~/.ssh/
directory. In this case, the file is identified as <ssh-auth-file>.pem
. Use this as a placeholder for your own file.
Start with Host_1. Log in as the ec2-user:
ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${Host_1}
Create the incorta user and group for each of the seven hosts. You will do this from the bash command line after logging in with ssh as the default user, ec2-user.
sudo groupadd -g 1220 incortasudo useradd -u 1220 -g incorta incorta
Give the incorta user permission to use sudo
. Backup the original file first.
sudo cp -r /etc/sudoers /etc/sudoers.bk
Use visudo
to set values in the /etc/sudoers
file:
sudo visudo
After the line reading root ALL=(ALL) ALL
add the line incorta ALL=(ALL) NOPASSWD: ALL
. The result should look like the following (additions in bold):
root ALL=(ALL) ALLincorta ALL=(ALL) NOPASSWD: ALL
Set up the hosts so you can log in directly as the incorta user. Copy the .ssh
file from the ec2-user home directory to the incorta home directory.
sudo cp -rp ~/.ssh /home/incortasudo chown -R incorta.incorta /home/incorta/.ssh
Log out of the host:
exit
Repeat this procedure for setting up the incorta user on Host_2, Host_3, Host_4, Host_5, Host_6 and Host_7.
Install Java
Update and Install Existing Packages
This section assumes you have created a group and user ID as described in Setting Up the Incorta Group and User. In this document, you continue to use incorta as the user.
You need to make sure the necessary components are up to date for each host. You will also need to add additional utilities to support Incorta.
Log in as the incorta user:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${Host_1}
Update the existing packages:
sudo yum -y update
Install utilities required for supporting various applications needed for Incorta:
sudo yum -y install telnetsudo yum -y install expect
Install the Java OpenJDK 11
Many of the software components required to run Incorta are Java applications. You will need to add the Java OpenJDK and associated components for each of the seven hosts to run the required software.
Install the Java OpenJDK 11 for Amazon Linux 2:
sudo amazon-linux-extras install java-openjdk11
Install the Open JDK for Java 11:
sudo yum -y install java-11-openjdk-devel
Update the Java alternatives:
sudo update-alternatives --config javac
From the above command, use the path provided in the java-11-openjdk.x86_64 row as your JAVA_HOME environment variable in your custom.sh file, omitting the /bin/javac
path sections. Creation of the custom.sh file occurs in the following steps.
Example of desired path section: /usr/lib/jvm/java-11-openjdk-<JDK_VERSION>
Accept the default by pressing Enter.
Confirm the version of Java matches what you just installed:
java -versionopenjdk version "11.0.5" 2019-10-15 LTSOpenJDK Runtime Environment 18.9 (build 11.0.5+10-LTS)OpenJDK 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode, sharing)
Install additional packages to support the Java installation:
sudo yum -y install gccsudo yum -y install byaccsudo yum -y install flex bison
Make it so the JAVA_HOME environment variable is set when logging in to the host. Do this by creating a shell script file that will be run at login time. Create a file called custom.sh
in the directory /etc/profile.d
using the editor of your choice. You will need root privileges to do this. For example:
sudo vim /etc/profile.d/custom.sh
Then add the following to the file:
Use the path provided in the above steps when you ran the update-alternatives command above.
##! /bin/bashexport JAVA_HOME=/usr/lib/jvm/java-11-openjdk-<JDK_VERSION>export PATH=$PATH:$JAVA_HOME/bin
Note that the character denotes line continuation.
Verify that the environment variables, JAVA_HOME and PATH, have defined values for the incorta user.
source ~/.bash_profileecho $JAVA_HOME/usr/lib/jvm/java-11-openjdk-11.0.5.10-0.amzn2.x86_64
You will need to add more environment variable settings to custom.sh
in the procedures that follow.
Log out of the host.
exit
Repeat these steps for installing Java (Install Java) on the remaining hosts (Host_2, Host_3, Host_4, Host_5, Host_6 and Host_7).
Create the Incorta Installation Directory
As the incorta Linux user, create the Incorta default installation directory on Host_1, Host_2, Host_3, Host_4, and Host_5. All Incorta components will be installed in the directory /home/incorta/IncortaAnalytics/.
You must create this directory before starting the Incorta installer. The Incorta installer will fail if a valid directory is not specified at installation time.
Log in to Host_1 as the incorta user:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
Create the IncortaAnalytics
directory.
mkdir IncortaAnalytics
Log out of Host_1.
exit
Repeat these steps for Host_2, Host_3, Host_4, and Host_5.
Setting Up Shared Storage
A common network file storage mount is one way to define Shared Storage in an Incorta Cluster typology. In AWS, network file storage is Elastic File Storage (EFS). With EFS, EC2 hosts can share files.
In order to share files between Incorta Nodes and Apache Spark in the Incorta Cluster, you must first install the Amazon EFS utility on Host_1, Host_2, Host_3, Host_4, and Host_5. Host_6 and Host_7 do not require this package as they do not need to access Shared Storage.
To begin, log in to Host_1 as the incorta user:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
Install the Amazon EFS utility:
sudo yum -y install amazon-efs-utils
Log out of Host_1.
exit
Repeat these steps for Host_2, Host_3, Host_4, and Host_5.
Create the EFS Mount for Host_1
Important: The procedure for Host_1 differs from Host_2, Host_3, Host_4, and Host_5.
Both the Cluster Management Console (CMC) and Apache Spark on Host_1 require access to Shared Storage.
First create the EFS mount and then create the Tenants directory in the EFS mount.
AWS hosts facilitate file sharing by providing a file system directory name and ID pair. You will need an ID and shared directory to complete this procedure. For example:
EFS identifier : <efs-ID>EFS directory : <efs-shared-dir>
Log in to Host_1 as the incorta user:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
To make it easy to be consistent in this procedure, set the shell variables for the EFS identifier and EFS directory values. For example:
EFS_SHARED_DIR=<efs-shared-dir>EFS_ID=<efs-ID>
Create the shared directory:
cd /mnt/sudo mkdir ${EFS_SHARED_DIR}
Verify the directory has been created:
ls -ldrwxrwxrwx 4 incorta incorta 6144 Feb 11 18:54 <efs-shared-dir>
Mount the directory:
sudo mount -t efs ${EFS_ID}:/${EFS_SHARED_DIR} /mnt/${EFS_SHARED_DIR}
Create the Tenants directory:
sudo mkdir ${EFS_SHARED_DIR}/Tenants
Modify the mount point's access rights for the incorta group and user:
sudo chown -R incorta:incorta ${EFS_SHARED_DIR}sudo chmod -R go+rw ${EFS_SHARED_DIR}
Do the same for the Tenants directory:
sudo chown -R incorta:incorta ${EFS_SHARED_DIR}/Tenantssudo chmod -R go+rw ${EFS_SHARED_DIR}/Tenants
Get the full path of the Tenant directory for later use:
cd ${EFS_SHARED_DIR}/Tenantspwd/mnt/<efs-shared-dir>/Tenants
Next, create a file in the Tenants directory:
echo "efs test" > test.txtls -lcat test.txtefs test
Log out of Host_1.
exit
This completes the set up process for Host_1. Next, set up Host_2, Host_3, Host_4 and Host_5. They will mount the shared directory to access the contents of Tenants.
Create the EFS Mount for Host_2, Host_3, Host_4 and Host_5
Log in to Host_2 as the incorta user. If repeating this step, remember to change the value of ${HOST_2} to {HOST_3, ${HOST_4} and ${HOST_5}:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_2}
Set the shell variables:
EFS_SHARED_DIR=<efs-shared-dir>EFS_ID=<efs-ID>
Create the local directory:
cd /mnt/sudo mkdir ${EFS_SHARED_DIR}
Mount the shared directory:
sudo mount -t efs ${EFS_ID}:/${EFS_SHARED_DIR} /mnt/${EFS_SHARED_DIR}
Verify Host_2 can access the test.txt
file created by Host_1:
ls -l /mnt/${EFS_SHARED_DIR}/Tenants/cat /mnt/${EFS_SHARED_DIR}/Tenants/test.txtefs test
Log out of Host_2:
exit
Now repeat these steps for Host_3, Host_4 and Host_5.
Installing and Configuring Apache ZooKeeper
Incorta uses Apache ZooKeeper for distributed communications. In this procedure, you will install and configure Apache ZooKeeper on Host_1, Host_6 and Host_7. The three ZooKeeper server instances will create a quorum for processing distributed messages.
Installing ZooKeeper
Log in as the incorta user then retrieve the Apache ZooKeeper installation file. Start with Host_1:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
Fetch Apache ZooKeeper version 3.6.1 and place it into the host's /tmp
directory. Note: the code box below scrolls horizontally.
cd /tmpwget https://archive.apache.org/dist/zookeeper/zookeeper-3.6.1/apache-zookeeper-3.6.1-bin.tar.gz
Extract the contents of the file then make it part of the locally installed software packages:
tar -xzf apache-zookeeper-3.6.1-bin.tar.gzsudo mv apache-zookeeper-3.6.1-bin /usr/local/zookeeper
Create a ZooKeeper data directory:
sudo mkdir /var/lib/zookeeper
Prepare to make a custom ZooKeeper configuration file by duplicating the sample configuration file. Note: the code box below scrolls horizontally.
sudo cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg
Configuring ZooKeeper
Open the custom ZooKeeper configuration file with a text editor. For example:
sudo vi /usr/local/zookeeper/conf/zoo.cfg
Look for the line beginning with dataDir
and change it to read:
dataDir=/var/lib/zookeeper
Move to the bottom of the file and add two lines as follows:
admin.enableServer=falsezookeeper.admin.enableServer=false
At the bottom of the file, add the IP addresses and port ranges for all ZooKeeper hosts:
server.1=<HOST_1_Private_IP>:2888:3888server.2=<HOST_6_Private_IP>:2888:3888server.3=<HOST_7_Private_IP>:2888:3888
Save your work, quit the editor, then log out of Host_1:
exit
Repeat Installing ZooKeeper and Configuring ZooKeeper for Host_6 and Host_7.
Setting Up the ZooKeeper IDs
You will complete this section for Host_1, Host_6 and Host_7. Start with Host_1.
Log in:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
Create an ID file for Host_1. The ZooKeeper server ID for Host_1 is 1.
echo 1 | sudo tee -a /var/lib/zookeeper/myid
Log out of Host_1:
exit
Create an ID file for Host_6. The ZooKeeper server ID for Host_6 is 2.
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_6}echo 2 | sudo tee -a /var/lib/zookeeper/myidexit
Create an ID file for Host_7. The ZooKeeper server ID for Host_7 is 3.
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_7}echo 3 | sudo tee -a /var/lib/zookeeper/myidexit
Starting ZooKeeper
Start ZooKeeper for Host_1 by first logging in:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
Run ZooKeeper's control script to start the service:
sudo /usr/local/zookeeper/bin/zkServer.sh start
Log out:
exit
Start ZooKeeper on Host_6:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_6}sudo /usr/local/zookeeper/bin/zkServer.sh startexit
Start ZooKeeper on Host_7:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_7}sudo /usr/local/zookeeper/bin/zkServer.sh startexit
Verifying Quorum
Next, query the ZooKeeper status on each ZooKeeper host to verify quorum. ZooKeeper defines quorum as an odd number of servers where one server is Leader and the two other servers are Followers.
On Host_1, check the status:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}sudo /usr/local/zookeeper/bin/zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /usr/local/zookeeper/bin/../conf/zoo.cfgClient port found: 2181. Client address: localhost.Mode: followerexit
On Host_6, check the status:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_6}sudo /usr/local/zookeeper/bin/zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /usr/local/zookeeper/bin/../conf/zoo.cfgClient port found: 2181. Client address: localhost.Mode: followerexit
On Host_7, check the status:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_7}sudo /usr/local/zookeeper/bin/zkServer.sh status/bin/javaZooKeeper JMX enabled by defaultUsing config: /usr/local/zookeeper/bin/../conf/zoo.cfgClient port found: 2181. Client address: localhost.Mode: leaderexit
One host reports as LEADER. The other two hosts report as FOLLOWER. This indicates a ZooKeeper quorum for the ZooKeeper Ensemble. In the example above, Host_1 and Host_6 are followers. Host_7 is the leader.
Install MySQL Server and Create the Incorta Metadata Database
You will now set up a metadata database managed with MySQL Server. In this section of the guide, the procedure describes how to install MySQL on Host_1 and then create a database. The MySQL database stores metadata about Incorta objects such as schemas, business schemas, and dashboards. For production use, Incorta supports MySQL Server.
Install and Start MySQL server
There are references to two different types of root users: the Linux root user and the MySQL root user. The Linux root user is used to install MySQL server. The MySQL root user is a default administrative user for MySQL.
Start by logging into Host_1:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
Switch to the Linux root user:
sudo su
Download the MySQL RPM. Note: the code box below scrolls horizontally.
rpm -ivh http://repo.mysql.com/mysql-community-release-el5-6.noarch.rpm
Install MySQL server:
yum install -y mysql-server
Start the MySQL server daemon:
service mysqld startStarting mysqld (via systemctl):[OK]
If the above command fails, try starting MySQL server with the following command:
/etc/init.d/mysqld start
Create a password for the MySQL root user. The password 'incorta_root' is used for illustrative purposes only.
/usr/bin/mysqladmin -u root password 'incorta_root'
The command generates a warning that can be safely ignored.
Create the Incorta Metadata Database
You will now create the needed database.
Log in to the MySQL Client CLI:
mysql -h0 -uroot -pincorta_root
where:
-h = host: 0 references the localhost
-u = user: root in this case
-p = password: password a specified at installation time
Create the database. In this document, we are calling it incorta_metadata
.
create database incorta_metadata;Query OK, 1 row affected (0.00 sec)
Create the MySQL incorta users
After creating the incorta_metadata
database, next create the same named MySQL user for each Incorta Host in the cluster: Host_1, Host_2, Host_3, Host_4, and Host_5. For illustrative purposes, the MySQL user is incorta and the password is Incorta#1.
Create the MySQL users for all of the hosts on the subnet. In this example, the subnet is 192.168.128.255. The usernames use both the localhost reference and Private IP.
create user 'incorta'@'localhost'create user 'incorta'@'<Host_1_Private_IP>' identified by 'Incorta#1';
Create a MySQL user for Host_2.
create user 'incorta'@'<Host_2_Private_IP>' identified by 'Incorta#1';
Create a MySQL user for Host_3.
create user 'incorta'@'<Host_3_Private_IP>' identified by 'Incorta#1';
Create a MySQL user for Host_4.
create user 'incorta'@'<Host_4_Private_IP>' identified by 'Incorta#1';
Create a MySQL user for Host_5.
create user 'incorta'@'<Host_5_Private_IP>' identified by 'Incorta#1';
Verify the users are created:
select User, Host from mysql.user where user = 'incorta';+---------+---------------+| User | Host |+---------+---------------+| incorta | localhost || incorta | 192.168.128.1 || incorta | 192.168.128.2 || incorta | 192.168.128.3 || incorta | 192.168.128.4 || incorta | 192.168.128.5 |+---------+---------------+6 rows in set (0.00 sec)
Grant Database Access Privileges
After creating MySQL users, next grant the users ALL privileges for the incorta_metadata database.
For the Host_1 incorta users, grant the ALL privilege to the incorta_metadata database:
grant all on *.* to 'incorta'@'localhost' identified by 'Incorta#1';grant all on *.* to 'incorta'@'<Host_1_Private_IP>' identified by 'Incorta#1';
For the Host_2 incorta users, grant the ALL privilege to the incorta_metadata database:
grant all on *.* to 'incorta'@'<Host_2_Private_IP>' identified by 'Incorta#1';
For the Host_3 incorta users, grant the ALL privilege to the incorta_metadata database:
grant all on *.* to 'incorta'@'<Host_3_Private_IP>' identified by 'Incorta#1';
For the Host_4 incorta users, grant the ALL privilege to the incorta_metadata database:
grant all on *.* to 'incorta'@'<Host_4_Private_IP>' identified by 'Incorta#1';
For the Host_5 incorta users, grant the ALL privilege to the incorta_metadata database:
grant all on *.* to 'incorta'@'<Host_5_Private_IP>' identified by 'Incorta#1';
Verify the privileges have been granted for all the MySQL incorta users. Note: the code box below scrolls horizontally.
show grants for 'incorta'@'localhost';+--------------------------------------------------------------------------------------------+| Grants for incorta@localhost |+--------------------------------------------------------------------------------------------+| GRANT ALL PRIVILEGES ON *.* TO 'incorta'@'localhost' IDENTIFIED BY PASSWORD || '*3304B4423C0D30FD76006E85829E9C5A695C1B33' |+--------------------------------------------------------------------------------------------+1 row in set (0.00 sec)show grants for 'incorta'@'192.168.128.1';+--------------------------------------------------------------------------------------------+| Grants for incorta@192.168.128.1 |+--------------------------------------------------------------------------------------------+| GRANT ALL PRIVILEGES ON *.* TO 'incorta'@'192.168.128.1' IDENTIFIED BY PASSWORD || '*3304B4423C0D30FD76006E85829E9C5A695C1B33' |+--------------------------------------------------------------------------------------------+1 row in set (0.00 sec)show grants for 'incorta'@'192.168.128.2';+--------------------------------------------------------------------------------------------+| Grants for incorta@192.168.128.2 |+--------------------------------------------------------------------------------------------+| GRANT ALL PRIVILEGES ON *.* TO 'incorta'@'192.168.128.2' IDENTIFIED BY PASSWORD || '*3304B4423C0D30FD76006E85829E9C5A695C1B33' |+--------------------------------------------------------------------------------------------+1 row in set (0.00 sec)
Verify privileges for Host_3, Host_4 and Host_5:
show grants for 'incorta'@'192.168.128.3';show grants for 'incorta'@'192.168.128.4';show grants for 'incorta'@'192.168.128.5';
Exit the MySQL Client:
exit;
Verify the incorta user (the host incorta user) has access to the incorta_metadata
database:
mysql -h0 -uincorta -pIncorta#1 incorta_metadata
Exit the MySQL Client:
exit;
The incorta_metadata
database has been created and verified.
Next, exit the root user and then log out of Host_1:
exitexit
Install and Start the Cluster Management Console (CMC)
In this step you will install and start the CMC on Host_1. This requires you to unzip the incorta-package_<version>.zip
file and run the Incorta installer. The installer guides you through making selections appropriate for installing a CMC that manages Incorta nodes on other hosts.
Run the Incorta Installer
Log in to Host_1 as the incorta user.
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
Start by unzipping the Incorta package:
cd /tmpmkdir incortaunzip incorta-package_<version> -d incorta
Run the Incorta installer:
cd /tmp/incortajava -jar incorta-installer.jar -i console
The Incorta installer will present a series of prompts. The answers for creating a CMC on the host are shown to the right of each prompt as shown below.
The first set of prompts relate to the installation. All the prompts are important. However, note the one labeled "Incorta HA components". Here you select the CMC option to install the CMC software. For the Incorta hosts (later steps) you will select (2) for "Incorta HA components".
Welcome prompt : Press ENTERLicense Agreement : Y : (Accept)Installation Type : 1 : (New Installation)Installation Set : 2 : (Custom Installation)Incorta HA components : 1 : (Central Management Console (CMC))Installation Folder : Press Enter to accept the default directory
By default the installation folder matches the home directory of the current user plus IncortaAnalytics
. For example: /home/incorta/IncortaAnalytics
. This is the directory you created in a previous step.
The second set of prompts is concerned with how the CMC is made available for use as well as how other hosts communicate with it. This guide will use the default ports as shown.
CMC Configuration—Step 1
Server Port (6005) : Accept (Enter)HTTP Connector Port (Default: 6060) : Accept (Enter)HTTP Connector Redirect Port (Default: 6443) : Accept (Enter)AJP Connector Port (Default: 6009) : Accept (Enter)AJP Connector Redirect Port (Default: 6443) : Accept (Enter)
The third and final set of prompts address the memory heap size for the CMC and the administrator username and password. For the CMC, there is only one administrator user and no other users. This guide uses the default values. The password, Incorta#1, is for illustration purposes.
CMC Configuration—Step 2
Memory Heap Size : Accept default (Enter)Administrator's Username : Accept default (Enter) (admin)Administrator's Password : Incorta#1
With all of the installation parameters entered, press Enter to begin the installation process at the Ready To Install CMC prompt.
Select Start CMC to start the CMC once installation is complete. At the installation status prompt, confirm the successful start of the CMC. You should see the following:
==============================================================Installation Status-------------------Success! Incorta Analytics has been installed under thefollowing path:/home/incorta/IncortaAnalytics/cmcTo access your CMC installation please go to this link.http://<HOST_1_Private_IP>:6060/cmc/
Sign in to the CMC
In a browser, navigate to the IPv4 Public IP address of the CMC:
http://<HOST_1_IPv4_Public_IP>:6060/cmc/
At the login prompt, use your login information. In this document, the admin user and password Incorta#1 are used as specified in the previous section.
Select Sign In. Confirm you see the Welcome message. This confirms you have installed the CMC.
Keep the Cluster Management Console open as it will be used later to create the Incorta Cluster.
Log out of Host_1:
exit
Install, Configure, and Start Apache Spark
You need to install Apache Spark on Host_1. Incorta requires a specific version of Spark to run. This version is included in the Incorta package.
Installing Apache Spark
You will run the Incorta installer once again on Host_1, this time selecting "Incorta HA components".
Start by logging in to Host_1 as the incorta user:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
Change directories to /tmp/incorta
and start the Incorta installer:
cd /tmp/incortajava -jar incorta-installer.jar -i console
IMPORTANT: As this procedure is for installing Apache Spark, do NOT respond as if you were installing the CMC . Instead of CMC, select Incorta HA components.
The sole purpose of installing the Incorta HA components on Host_1 is to install the correct version of Apache Spark that the Incorta High Availability Cluster requires. This guide uses the default port and default installation directory.
Below are the responses for each prompt the installer will present.
Welcome prompt : ENTERLicense Agreement : Y (Accept)Installation Type : 1 (New Installation)Installation Set : 2 (Custom Installation)Incorta HA components : 2 (Incorta HA Node)Installation Folder : Press enter to accept the defaultIncorta Node Agent Port : Press enter to accept the default(Default: 4500)Public IP : Use HOST_1 public IP addressStart Node Agent : 0 (Disable automatic start)
Press Enter at the Ready to Install prompt. At the success prompt, press Enter.
Configure Spark
You will find Spark in /home/incorta/IncortaAnalytics/IncortaNode/spark
. Configuration requires you add the Spark binaries directory to the PATH environment variable. Edit /etc/profile.d/custom.sh
to include the Spark home directory environment variable and add the Spark binary directory to the path.
Open custom.sh
:
sudo vim /etc/profile.d/custom.sh
Add SPARK_HOME and add to the PATH:
##! /bin/bashexport JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.5.10-0.amzn2.x86_64export SPARK_HOME=/home/incorta/IncortaAnalytics/IncortaNode/sparkexport PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin
Save custom.sh
and quit the editor.
Update your environment variables:
source ~/.bash_profile
Set up to work on the Spark configuration files:
cd $SPARK_HOME/conf
Set Spark DNS Address, IP Address and Port Numbers
With the editor of your choice, open spark-env.sh
. Find the parameters below and set them as follows:
SPARK_PUBLIC_DNS=<HOST_1_Public_DNS_IPv4>SPARK_MASTER_IP=<HOST_1_Private_IP>SPARK_MASTER_PORT=7077SPARK_MASTER_WEBUI_PORT=9091SPARK_WORKER_PORT=7078SPARK_WORKER_WEBUI_PORT=9092SPARK_WORKER_MEMORY=6g
This guide uses the default values as shown. Adjust the SPARK_WORKER_MEMORY value as appropriate for your installation. Save this file and exit the editor.
Define Spark Resource Limits
Open spark-defaults.conf
with the editor of your choice and set the following (scroll right in the text box):
spark.master spark://<HOST_1_Private_IP>:7077spark.eventLog.enabled truespark.eventLog.dir /home/incorta/IncortaAnalytics/IncortaNode/spark/eventlogsspark.local.dir /home/incorta/IncortaAnalytics/IncortaNode/spark/tmpspark.executor.extraJavaOptions-Djava.io.tmpdir=/home/incorta/IncortaAnalytics/IncortaNode/spark/tmpspark.driver.extraJavaOptions-Djava.io.tmpdir=/home/incorta/IncortaAnalytics/IncortaNode/spark/tmpspark.cores.max 2spark.executor.cores 2spark.sql.shuffle.partitions 2spark.driver.memory 6gspark.port.maxRetries 100
The values for spark.cores.max, spark.executor.cores, spark.sql.shuffle.partitions, spark.driver.memory and spark.port.maxRetries are hardware specific. Set them as appropriate for the cluster you are building. For more information, see Performance Tuning, section Analytics and Loader Service Settings.
The paths entered for spark.eventLog.dir, spark.local.dir, spark.executor.extraJavaOptions and spark.driver.extraJavaOptions refer to /home/incorta/IncortaAnalytics
.
Start Spark:
cd ~/IncortaAnalytics/IncortaNode./startSpark.sh
Log out of Host_1:
exit
In a browser, navigate to the Spark master web interface to view the Spark master node:
http://<HOST_1_IPv4_Public_IP>:9091
You will see <HOST_1_Private_IP>:7077 at the top of the page and information about the applications Spark is handling. At this point it will only show the Worker just started.
View the Spark worker node at port 9092:
http://<HOST_1_IPv4_Public_IP>:9092
You will again see an Apache Spark page, showing <HOST_1_Private_IP>:7078 at the top of the page and potentially tables showing running and finished executors. This shows that Spark is now installed, configured and ready to use.
Install the Incorta HA Components
The Incorta HA components include the Node Agent. The Node Agent is used to start and stop an HA node. The Node Agent is all you are concerned with here. The other binaries are used only after the Node Agent has been started.
You will now install the Incorta HA components on the hosts intended to support the loader and analysis services. Perform the following for the loader and analytics hosts (Host_2, Host_3, Host_4 and Host_5).
Log in to Host_2 as the incorta user:
ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@<HOST_2_IPv4_Public_IP>
Then:
mkdir incortaunzip incorta-package_<version>.zip -d incorta
Run the Incorta installer:
java -jar incorta-installer.jar -i console
The responses to the prompts in this wizard are the same as when you installed HA components to gain access to Spark, except you will start the Node Agent this time.
Welcome prompt : EnterLicense Agreement : YInstallation Type : 1 (New Installation)Installation Set : 2 (Custom Installation)Incorta HA Components : 2 (Incorta HA Node)Installation Folder : `/home/incorta/IncortaAnalytics`Node Agent Configuration (port) : 4500Public IP Address : <HOST_2_IPv4_Public_IP>Start Node Agent : 1 (Start node agent)**
Now press Enter to begin the installation. The following should appear indicating the installation has been successful:
=============================================================Installation Status-------------------Success! Incorta Analytics has been installed under thefollowing path:/home/incorta/IncortaAnalytics/IncortaNode
Log out of Host_2:
exit
You have completed the installation of the Incorta HA components for Host_2. The Incorta Node Agent is running on this host as well. To complete this procedure, you will need to install the Node Agent on Host_3, Host_4 and Host_5 using these same instructions.
Create the Incorta Cluster
You will create an Incorta cluster in this procedure. This will be a cluster ready for the federation process in which nodes are made known to the cluster and services are added to the nodes. Please note hosts are now referred to as nodes in the documentation.
The wizard used in this procedure requires five steps to complete:
- Basic: provide the name of the cluster
- Database: set up a reference to the database managed by MySQL from Set Up the Incorta Metadata Database
- System Admin: Set up credentials for accessing the cluster
- ZooKeeper: Specify the nodes responsible for ZooKeeper redundancy
- Spark Integration: Set up how the CMC and Spark communicate
Start by signing into the CMC using the credentials you set up during CMC installation. If you have inadvertently closed the CMC window, point your browser to
http://<HOST_1_IPv4_Public_IP>:6060/cmc/
Select Clusters in the Navigation Bar. The action bar should read home > clusters.
Bring up the new clusters wizard by selecting the add button (+ icon, upper right).
Basic
For the name of the cluster, use exampleCluster. Select the Check button to make sure the name is not already in use. Select Next to move to the Database step.
Database
You previously created a MySQL metadata database for Incorta (incorta_metadata
). For the Database Type, select MySQL.
For JDBC URL, use the private IP address of the host managing the metadata database. This is <HOST_1_Private_IP>
. For the port number, use 3306
. For the database name, use the actual name of the database which is incorta_metadata
if you have been using the values provided in this guide. Your entry for JDBC URL should look like this:
jdbc:mysql://<HOST_1_Private_IP>:3306/incorta_metadata?useUnicode=yes&characterEncoding=UTF-8
For username and password, enter the following as previously described:
Username : incortaPassword : Incorta#1
Select Next to move to the System Admin step.
System Admin
In this document, we are using admin for the Username and Password:
Username : adminPassword : Incorta#1Email : admin@incorta.comPath : /mnt/<efs-shared-dir>/Tenants
where /mnt/<efs-shared-dir>/Tenants
is confirmed through ls /mnt
. For example: /mnt/efs_03a5369/Tenants
Select Check disk space.
Select Next to proceed to the ZooKeeper step.
ZooKeeper
ZooKeeper URL: <HOST_1_Private_IP>:2181,<HOST_6_Private_IP>:2181,<HOST_7_Private_IP>:2181
Use the private IP addresses for all hosts that will be making up the ZooKeeper ensemble. Select Next to advance to the Spark Integration step.
Spark Integration
In a browser, navigate to <HOST_1_IPv4_Public_IP>:9091
Look for the line near the top of the page beginning with "URL". Copy this URL including the port number 7077 and paste it in the text box for Master URL. For example,
Master URL: spark://ip-192-168-128-1.ec2.internal:7077
This guide uses the default values for the remaining entries:
App Memory (GB) : 1App Cores : 1App Executors : 1DS Port : 5442
Select Next to get to the Review step and review your settings. Use the Back button to view and make changes to settings in previous steps. When you are satisfied with your settings, select the Create button. You will receive notification the cluster was successfully created. Select Finish.
Federating Nodes in the Incorta Cluster
In this guide, Host_2 and Host_3 will run the loader service and Host_4 and Host_5 will run the analytics service. Federating Nodes means adding the hosts running the Node Agent to the cluster. Once the hosts are federated, they are referred to as Incorta Nodes.
In the Cluster Navigation Bar, select Nodes and verify the Action Bar shows home > nodes.
You will use the Federation wizard to add Nodes to the Cluster. In short, this wizard goes through three steps, two of them requiring information from you:
- Discover: identify a host to federate by its private IP address
- Federate: provide a unique name for the node
- Finish completes the federation process
Start the Node Federation wizard by selecting the Add button (+).
Discover
Host : Enter the private IP addresses for the host, for example: <HOST_2_Private_IP>
Port : Accept the default of 4500
Select Next to proceed to federating individual nodes.
Federate
In this step you name the nodes. See the table below. Select Federate once the unique name is confirmed. Add another Node by selecting Add another node in the canvas and add the second Node. Continue this process for nodes 4 and 5.
The Node naming convention is shown in the table below:
EC2_Host | Name |
Host_2 | IncortaNodeLoader_1 |
Host_3 | IncortaNodeLoader_2 |
Host_4 | IncortaNodeAnalytics_1 |
Host_5 | IncortaNodeAnalytics_2 |
You now have four federated nodes: two will be designated as Loader nodes and two will be designated as Analytics nodes. What these nodes do is determined by the services you assign to them. In the next procedure, you will assign loader and analytics services to the four nodes.
Add Services to the Nodes in the Cluster
To add services to the nodes in your cluster, access the the nodes page from inside the desired cluster.
Add and Configure the Loader Services
In the federated nodes canvas, a list of the four federated nodes is visible. Select on the label for the first Loader Node (i.e., IncortaNodeLoader_1).
The canvas that appears is specific to the node and will be labeled according to the node name. Select Services in the bar toward the bottom of the canvas then select the Add button (+) to bring up the Create a new service wizard. This wizard has two steps:
- Basic: give the service a name, declare the type of service, resource utilization (memory, CPU)
- Additional Settings: supply port numbers for the services required for a Loader service
In this guide, the resource utilization settings are the defaults. You may need to make adjustments in either step to accommodate the needs of your cluster.
Basic Settings
In this step, you provide a name for the service, designate the type of service being created and its resource utilization limits. As this Node was designated as a Loader, the guide uses the following values for the parameters:
Service Name : LoaderService_1Type : LoaderMemory Size (GB) : 12 (adjust as appropriate)CPU Utilization (%) : 90 (adjust as appropriate)
Select Next.
Additional Settings
This guide uses the following port numbers for Loader services:
Tomcat Server Port : 7005HTTP Port : 7070HTTP Redirect Port : 7443
Select Create. The loader service will be created and associated with IncortaNodeLoader_1. Select Finish.
In the Navigation bar, select Nodes again then select the second Loader node. Follow the procedure used for node IncortaNodeLoader_1, using the same settings except, name the service LoaderService_2.
Service Name : LoaderService_2Type : LoaderMemory Size (GB) : 12CPU Utilization (%) : 90
Select Next.
The guide uses the following port numbers:
Tomcat Server Port : 7005HTTP Port : 7070HTTP Redirect Port : 7443
Select Create. The loader service will be created and associated with IncortaNodeLoader_2. Select Finish.
Add and configure the Analytics Services
The Analytics Services are set in a similar manner. Select Nodes, then select the first Analytics node, IncortaNodeAnalytics_1. As with the Loader nodes, select the Add button. For Basic Settings, this guide uses the defaults.
Basic Settings
Service Name : AnalyticsService_1Type : AnalyticsMemory Size (GB) : 13CPU Utilization (%) : 75
Select Next.
Additional Settings
Tomcat Server Port : 8005HTTP Port : 8080HTTP Redirect Port : 8443AJP Port : 8009AJP Redirect Port : 8443
Select Create then select Finish. Repeat for the second Analytics node, setting the Service Name to AnalyticsService_2.
Configuring and Starting the Cluster
Configuring the Cluster means associating the created services with the Cluster. Doing this gives the Cluster functionality. When the services are associated (or joined) with the Cluster, they can be started and stopped individually or as part of a Cluster-wide operation. In Add Services to the Nodes, you associated a service with each Node. You now must connect the service with the cluster.
In the CMC Navigation Bar, select Clusters. Then in the Cluster list, select exampleCluster. Verify the path in the action bar reads: home > clusters > exampleCluster. Select Services. You will now add the Loader and Analytics services to the cluster. You will see each service in this canvas.
Join the Loader Services to the Cluster
- In the Services tab, in the Action Menu, select + to bring up the Add a service to the cluster dialog.
- In the Node pull down, select the first Loader node, IncortaNodeLoader_1. Recall that you added LoaderService_1 to this node.
- In the Service pull down, select LOADER.
- Select Add.
- Repeat steps 1 through 4 for the second Loader node, IncortaNodeLoader_2.
Join the Analytics Services to the Cluster
For the Analytics Services, use the Add a service to the cluster dialog again, only this time, select analytics nodes and services.
- In the Services tab, in the Action Menu, select + to bring up the Add a service to the cluster dialog.
- In the Node pull down, select IncortaNodeAnalytics_1. Recall that you added AnalyticsService_1 to this node.
- In the Service pull down, select ANALYTICS.
- Select Add.
- Repeat steps 1 through 4 for IncortaNodeAnalytics_2.
Start the Cluster
- Select the Details tab in the action bar.
- Select the Start button in the lower right half of the Cluster canvas.
- Select the Sync button in the upper right corner of the canvas to monitor the process of starting the cluster. When the Analytics and Loader services read "started", the cluster is up and running.
Create a Tenant
You will use the Create a Tenant wizard to create a tenant for use with this example cluster. Select the Tenants tab in the action bar. Then, select the Add button.
Tenant
Name : example_tenantUsername : adminPassword : Incorta#1Email : admin@incorta.comPath : Path to the shared disk space (for example, /mnt/efs_090e465/Tenants)
Select Check disk space to confirm there is sufficient room for your datasets.
Change the switch position (to the right) to Include Sample Data, then select Next.
In this step, you are identifying who to contact about the tenant. You can select the Create button and finish creating the tenant at this point as the values here are optional. Illustrative examples are shown in the table below.
Sender's Username Auth : Disabled (default)System Email Address : Tenant owner's email address (for use as a user)System Email Password : Tenant owner's passwordSMTP Host : smtp.gmail.com (default)SMTP Port : 465 (default)Share Notifications : Disabled (default)
Verifying the Tenant
Load a Schema
Log in to Incorta at <HOST_4_IPv4_Public_IP>:8080/incorta/#/login. Use the administrator user and password, for example, admin/Incorta#1. In the Navigation bar, select Schema. From the list of schema, select SALES. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. Note the Last Load Status information. This shows the most recent load event for this schema and confirms the SALES schema is accessible to AnalyticsService_1.
Now check AnalyticsService_2 by logging into <HOST_5_IPv4_Public_IP>:8080/incorta/#/login. Select Schema from the Navigation bar and then select the SALES schema. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. Look at the Last Load Status information and see that it indicates the schema was just loaded. This confirms the SALES schema is accessible to AnalyticsService_2.
Checking SQLi Access
Incorta supports the SQL interface by exposing itself as a PostgreSQL database. Any client that runs the SQL queries against PostgreSQL via JDBC can query Incorta.
To check SQLi access, you will use DbVisualizer (free version) to connect with the Incorta tenant. You will need the IP addresses of both of the analytics hosts (<HOST_4_IPv4_Public_IP> and <HOST_5_IPv4_Public_IP>) and the name of the tenant (example_tenant) associated with the cluster.
Begin by downloading DbVisualizer and installing it on your local host. Start DbVisualizer. If the Connection Wizard appears, cancel out of it. You are going to enter the connection parameters in a form which shows all parameters at once.
Check SQLi Access Through AnalyticsNode_1
Create a connection using the DbVisualizer menus. From the DbVisualizer menu, select Database. From the menu, select Create Database Connection and then select the No Wizard button. This results in a Database Connection tab appearing with example parameter values. Enter the following parameters for the text boxes in the Connection tab.
Connection
Name : SQLi Check (what the connection is for)Notes : -- (optional)
Database
Settings Format : Server Info (not changeable)Database Type : PostgreSQL (type of database to read)Driver (JDBC) : PostgreSQL (driver to use to connect to database)Database Server : <HOST_4_IPv4_Public_IP> (IP address of IncortaNodeAnalytics_1)Database Port : 5436 (SQLi port)Database : example_tenant (name of the database; this is the tenant)
Authentication
Database Userid : admin (user ID to use when accessing the tenant)Database Password : Incorta#1
Options
Auto Commit : <check>Save Database Password : Save Between SessionsPermission Mode : Development
When you have completed your entries, check to be sure you can access the server through your specified port. Select the Ping Server button. If that works, connect to the server and access the database by selecting the Connect button. If you cannot successfully ping the server, check the IP address, the port number and the database name are correct. If you cannot connect to the server after successfully pinging it, check the Database name and Authentication parameters and try again.
Checking the Operability of IncortaNodeAnalytics_1
You can run a query on the database to confirm your connection is completely operational. From the DbVisualizer menu, select SQL Commander then select New SQL Commander. Set Database Connection to SQLi Check. For the remaining text boxes enter :
example_tenant : SALES : 1000 : -1
Enter a query in the editor. For example:
select * from SALES.PRODUCTS
You should see a list of products in the output window below. This confirms your ability to connect to Incorta using SQLi with this Analytics service. Disconnect from the database. From the DbVisualizer menu, select Database, then, from the menu, select Disconnect.
Check SQLi Access Through AnalyticsNode_2
Return to the Connection tab and change the IP address for the Database Server parameter to the IP address of IncortaNodeAnalytics_2:
Database Server : <HOST_5_IPv4_Public_IP>
Select the Ping Server button to be sure the Node is accessible through the port. Then select the Connect button.
Checking the Operability of AnalyticsNode_2
Run a query as you did for IncortaNodeAnalytics_1. You can use the same SQL Commander tab; select it and select the run button. You will see the same results as you did for AnalyticsNode_2. Disconnect from the database. From the DbVisualizer menu, select Database, then, from the menu, select Disconnect.
Summary of Accomplishments So Far
At this point you have established basic functionality of the Cluster:
- CMC: you created, composed and started a Cluster
- SQLi: verified a connection to Incorta by running a query
Now that you know the cluster is operational, you can set up a load balancer to support High Availability (HA).
Add Support for High Availability
You can configure a load balancer to be a single point of access for the Analytics Nodes in your cluster. You can also configure the load balancer to monitor the health of the Analytics Nodes so as to not route traffic to Nodes that are unresponsive.
Load Balancers (LB)
AWS offers the Classic Load Balancer and Network Load Balancer for EC2 hosts. Generally the configuration concerns are:
- Creating single point of access
- Registration--tell the load balancer to which Nodes it should route traffic
- Health Checks--tell the load balancer what to do when a Node is unresponsive
For more information and set up instructions, see:
- What is a Classic Load Balancer?
- Tutorial: Create a Classic Load Balancer
- What is a Network Load Balancer?
- Tutorial: Create a Network Load Balancer
To complete the tutorials you will need your AWS account credentials and the public IP addresses of your Analytics Nodes.
The end of this process yields a URL. This is the access point through the LB to your cluster. Here are two examples:
http://classiclb<aws-user-id>.us-east-1.elb.amazonaws.com:8080/incorta/http://networklb<aws-user-id>.us-east-1.elb.amazonaws.com:8080/incorta/
Confirming Cluster High Availability
High Availability (HA) in Incorta means functionality is available as long as at least one Loader service and one Analytics service are running. You should be able to log in to Incorta, do work and expect to continue to do work if one of each service type is available.
In this section, you will test whether Incorta operates with High Availability given service outages:
- one analytics service out
- one loader service out
You will test these conditions using the Incorta web GUI as well as through DbVisualizer.
Completing these tests using both Incorta access methods will confirm your instance of Incorta is High Availability.
NOTE: Distributed schemas do not support High Availability Incorta Clusters.
Test the Classic Load Balancer through Incorta Web GUI
To complete the next two tests, you will need your running configured Cluster as well as the running and configured Classic Load Balancer (CLB).
Analytics Service
Here you will test whether the Analytics Services are High Availability. You will do this by viewing a Dashboard with both analytics services started and then with only one analytics service. If Dashboards can be viewed under both of these conditions, you can conclude Analytics Services are High Availability.
It is not possible to determine which of the two Analytics Services your session will engage when logging in. For this reason, you may need to stop both services in turn, looking for an effect on your session. For example, if you log in, starting a session with AnalyticsService_1, then stopping AnalyticsService_2 will have no impact on your session. However, if you stop AnalyticsService_1, your session will end, requiring you to log back in. Because of failover, when you re-start your session, you will be able to resume from where you left off.
Start by logging in to the CMC in the usual way. Select Clusters in the Navigation bar. Select Services in the action bar and verify that both Loader and both Analytics services are started. Next, start an Incorta session by logging in through the CLB. Select Content to view the dashboards. Select Dashboards Demo then select Sales Executive Dashboard. You will see a number of insights appear on the canvas. This confirms an analytics service is running. Next, you will experiment with enabling and disabling the analytics services through the CMC to test High Availability.
- Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select AnalyticsService_1. In the canvas, select Stop.
- Go to the Analytics session and refresh the Dashboard. One of two things will happen. If the refresh happens immediately, your session is running through AnalyticsService_2 and your session was unaffected. On the other hand, if the refresh stalls and you are eventually presented with the login screen, your session was running through AnalyticsService_1. If your session was running through AnalyticsService_1, you can now test High Availability. Proceed to step 3. Otherwise, proceed to step 4.
- Log in to Incorta through the CLB. You should find your session returns you right back to where you left off in the previous session. Refresh the Dashboard to confirm the session has been restored. Your session can only be running through AnalyticsService_2 now. Incorta has failed over so that you can continue working. Continue to step 7.
- Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select AnalyticsService_1. In the canvas, select Restart. In the Navigation bar, select Clusters then select exampleCluster in the canvas. Select Services in the Action bar, then select AnalyticsService_2. In the canvas, select Stop.
- Return to your Incorta session and refresh the Dashboard. The refresh should stall and you will eventually be logged out. This shows your session was running through AnalyticsService_2.
- Log in to Incorta through the CLB. You should find your session returns you right back to where you left off in the previous session. Refresh the Dashboard to confirm the session has been restored. Your session can only be running through AnalyticsService_1 now. Incorta has failed over so that you can continue your session. Continue to step 7.
- Go to the CMC, select Clusters in the Navigation bar, select Details in the canvas, then select Restart. Press the sync button periodically and be sure the services read started. Once they do, confirm all services have restarted by selecting on Services in the canvas.
This result reveals that Analytics sessions are High Availability. Though you need to log in to your session again, the status of the session is retained and shown to you immediately.
Loader Service
Here you will test whether the loader services are High Availability. You will do this by initiating schema load operations. You will find that the loader service is always available as long as at least one of them is started. This is showing the active:active nature of the Loader service.
Start by logging in to the CMC in the usual way. Select on Clusters in the Navigation bar. Select Services in the action bar and verify that both Loader and both Analytics services are started. Next, start an Incorta session by logging in through the CLB. In the Navigation bar, select Schema. From the list of schema, select SALES. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. Check the Last Load Status indicator. It should show close to the current time. This checks that a load will occur with both Loader services started. Next, you will experiment with enabling and disabling the Loader services to test High Availability.
- Stop LoaderService_1. Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select on LoaderService_1. In the canvas, select Stop.
- Check Loader functionality. Go to the Analytics session. In the Navigation bar, select Schema. From the list of schema, select SALES. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. After a few moments, the Last Load Status indicator should show the load succeeded at a new time relative to when you started this work.
- Restart LoaderService_1. Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select on LoaderService_1. In the canvas, select Restart.
- Stop LoaderService_2. Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select on LoaderService_2. In the canvas, select Stop.
- Check Loader functionality. Go to the Analytics session. In the Navigation bar, select Schema. From the list of schema, select SALES. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. After a few moments, the Last Load Status indicator should show the load succeeded at a new time relative to the last load request.
This shows the Incorta loader service loads schema transparently. The impact on the analytics session is just to show the schema is loading and eventually shows the load is complete. This result shows that loader services are also High Availability. This result shows the High Availability of the Load Services.
Test the Network Load Balancer Through the SQLi Interface
To complete this test, you will need your running and configured Cluster as well as the running and configured Network Load Balancer (NLB).
SQLi accesses Incorta through the network load balancer (NLB). The objective is to verify SQL scripts can be run through an SQLi connection with either of the analytics services stopped. This is similar to the analytics service test performed through the CLB. To complete this test, you will need to download and install DbVisualizer (free version).
Log in to the CMC and verify all configured services are started.
Start DbVisualizer if it is not already started and select the connection titled SQLi Check from the Database tab on the left of the window.
In the canvas, change the value for Database Server to the NLB URL.
Confirm the port number to 5436. Change to 5436 if necessary.
Select Connect.
If no SQL Commander window is visible, from the DbVisualizer menu, select SQL Commander then, from the menu, select New SQL Commander. Otherwise, select the existing SQL Commander tab.
In the SQL Commander editor, run this SQL script:
select * from SALES.PRODUCTS
- From the CMC, stop AnalyticsService_1.
- Run the SQL script again. You will see a list of products from the SALES database.
- Start AnalyticsService_1 and stop AnalyticsService_2.
- Run the SQL script. You will again see a list of products from the SALES database. You can conclude it does not matter which Analytics Service you are using and therefore connecting through SQLi and port 5436 is High Availability.
- Repeat steps 4 through 11 connecting through port 5442. Your results will be identical to those using port 5436.
As you have stopped both Analytics services one at a time, you can see that the SQLi interface supports High Availability.
Summary
Using both the Classic Load Balancer and Network Load Balancer, you have successfully confirmed High Availability for the Incorta Cluster.