Nodes, Clusters, and Services
Relationship between node, cluster, and service
This section explains the hierarchy between nodes, clusters, and services, as shown below:
- First you install a node. A node is actually a physical container of services running on the same hardware (same machine). Typically one node should be enough per machine because the user can configure any number of services to be running on the machine hosting that node. Of course this requires the machine to be powerful enough to serve the expected workload.
- Once a node is installed it can be federated under some CMC instance. When this happens, no other CMC instance can federate it. User can configure services for this node remotely using CMC interface. User has the option to set all parameters of a new service like heap size, HTTP ports (for Analytics only), email configurations, cluster management console (CMC) login credentials, Spark integration settings and others. A service can not be started until it joins some cluster.
- A service is simply a JVM, a Tomcat instance running with a specific configurations provided by the user. All services on the same node share the same Tomcat binaries and same incorta.war file. Logs are different from one service to another of course.
- After this, user can create a cluster which is a virtual container of services hosted on different nodes installed on different machines. The cluster is also the controller of a set of tenants that the user can create from CMC UI as well or using TMT tool. As an example, one cluster can have 3 Analytics services running on 3 different machines and 4 loader services running on 4 different machines 3 of them are those hosting the 3 Analytics services in the same cluster.
- CMC can be used to administer any number of clusters (e.g. DEV, UAT, SIT, PROD) with some complex topologies that customer can ever need. On the other hand it can be used for simply running one cluster with 2 services to have the minimum installation.
- A cluster can have any number of tenants.
The advantage of multiple clusters on separate machines
One cluster can span over multiple machines which is known as horizontal cluster, a known term in HA world. CMC can administer multiple clusters of this kind. A horizontal cluster protects the solution against hardware failure and also distributes the load to be able to serve more client requests and more data.
A cluster can have multiple services running on the same machine which is known as vertical cluster. This topology protects the solution from software process failure to make sure that a certain machine will not be a wasted resource of a single JVM running on it fails. It requires the HW to be powerful enough (RAM, CPU and I/O) to host the services running there.
However, Rel4.0 can run several loader services on the same machine but can NOT run more than 1 analytics service on any machine because of some ports conflict related to SQLi feature.
The advantage of multiple nodes on different machines
As explained above, a node is the way to instantiate services (Tomcat JVM processes) on the same hardware. Having multiple nodes on different machines is the way to build a horizontal cluster that can provide failover and load balance functionality at the same time and protect the solution against hardware failure. If a machine fails, the node running on it will go down but other nodes on the other machines will be available to serve the business users until this failed machine is brought up again.
Also this option allows an important goal to be achieved which is partial upgrade. A cluster with 4 nodes can be updated one node at a time, or two nodes at a time. While you take down two nodes to upgrade the other two will keep serving end users. Then you bring up the two upgraded ones and shutdown the other two nodes to upgrade them.