Deploy on AWS

Incorta is the open data delivery platform powered by smart lakehouse technology. It enables customers, partners, and third-party developers to integrate with their modern data stacks to access business-ready data, migrate data to the cloud, and build innovative capabilities. With Incorta, users gain:

  • Unrivaled data access through Incorta’s Direct Data Mapping® technology. This technology enables analysis of raw business data that is 100% identical to the source down to the transaction level detail, even with multiple source systems and large datasets.
  • Unimaginably fast time to insight through the delivery of data in minutes. Quickly join additional data sources without impacting performance and empower users with sub-second self-service queries.
  • Trusted data accuracy by allowing complete control over data governance, giving users the confidence to make decisions using accurate data that is verifiable down to the transaction level.

From a development perspective, Incorta doesn't depend on specific dependencies in AWS deployments. Incorta provides High Availability (HA) and unlimited data storage with S3 or Amazon EFS (Elastic File System). Additionally, Incorta can leverage Amazon Aurora as a metadata database and Amazon Cloud Watch as an observability service in AWS deployment.

Launching Incorta Direct Data Platform on AWS results in the set up of the following:

  • EC2 instance.
  • EBS storage.
  • Incorta Direct Data Platform

Incorta's deployment on AWS marketplace sets up the following services on the EC2 instance (general completion time: 20-30 minutes):

  1. One Cluster Management Console (CMC). CMC is a stand-alone application used to manage HA infrastructures, including clusters, nodes (i.e. servers), services, and tenants. Different administrative functions (like start, stop, and edit configurations of nodes and clusters) are available through the CMC UI.
  2. One Incorta HA Node. An Incorta HA Node is a physical container of services residing on a single machine. Each node can contain multiple Incorta services (loader and/or analytics), assuming the machine has enough memory and computing power.
Note

The "analytics service" manages the Incorta UI and user queries, which are sent to Incorta to be executed. The analytics service also manages report schedules and on-demand export of data and images from Incorta. The loader service is specifically configured to manage data loading activities and creating parquet files and materialized views.

Technical Requirements

Before launching the Incorta Direct Data Platform on AWS, we recommend that you ensure having the following:

Skills

In order to complete Incorta's deployment successfully, it is required to possess the following technical skills at minimum:

  • Familiarity with AWS
  • Basic knowledge of Linux as this document shares most commands, but knowledge of the Linux commands is critical for real-world deployments

Launching Incorta Direct Data Platform for a new VPC on AWS results in the following setup with the default parameters.

images/AWS_Deployment.png

Infrastructure

According to AWS best practices, deploying the Incorta Direct Data Platform sets up an availability zone and a VPC with one public subnet that includes a NAT gateway and an Incorta Server. Refer to the following for a description of the infrastructure components:

  • Amazon VPC. "Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources into a virtual network that you've defined. This virtual network closely resembles a traditional network that you'd operate in your own data center, with the benefits of using the scalable infrastructure of AWS", according to Amazon's official site.
  • NAT Gateway. "You can use a network address translation (NAT) gateway to enable instances in a private subnet to connect to the internet or other AWS services, but prevent the internet from initiating a connection with those instances", according to the NAT Gateways article by Amazon.
  • IAM roles. Launching the Incorta Direct Data Platform instance configures IAM roles. "AWS Identity and Access Management (IAM) enables you to manage access to AWS services and resources securely. Using IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources. IAM is a feature of your AWS account offered at no additional charge. You will be charged only for use of other AWS services by your users", according to the IAM roles article by Amazon.
  • Security Group Rules. "The rules of a security group control the inbound traffic that's allowed to reach the instances that are associated with the security group and the outbound traffic that's allowed to leave them", according to the Security Group Rules article by Amazon.
  • Default Security Groups * " Your AWS account automatically has a default security group for the default VPC in each region. If you don't specify a security group when you launch an instance, the instance is automatically associated with the default security group for the VPC", as mentioned in the link above by Amazon.
  • Custom Security Groups * " If you don't want your instances to use the default security group, you can create your own security groups and specify them when you launch your instances. You can create multiple security groups to reflect the different roles that your instances play; for example, a web server or a database server", as mentioned in the link above by Amazon.
  • Working with Security Groups * Refer to this article by Amazon for more information about creating, viewing, updating, and deleting security groups, along with their rules using the Amazon EC2 console.

AWS Services

  • Amazon S3. "Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides easy-to-use management features so you can organize your data and configure finely-tuned access controls to meet your specific business, organizational, and compliance requirements", according to the Amazon S3 article by Amazon.
  • Amazon RDS. " Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups. It frees you to focus on your applications so you can give them the fast performance, high availability, security and compatibility they need", according to the Amazon RDS article by Amazon. You have the option to configure the RDS for the Incorta Direct Data Platform instance with either an Oracle or MySQL database.