You are viewing content for 4.4 | 4.3 | Previous Releases


Release Notes 4.5

Welcome to the latest features, enhancements, and main fixes in version 4.5.

This release improves Incorta scale, with a focus on the following areas:

  • Reduce loader service start time
  • Reduce loader service memory useage to increase performance
  • Support larger data volumes for individual tables
  • Connect to more data sources
  • Data lake support

The following section provides more details and information on other features in this release.

New Features

The following new features were released in Release 4.5.

Hardware Reduction and Scalability: Reduced Loader Service Startup Time and Loader Service Memory Reduction

This enhancement results in the following major improvements:

  • Loader memory reduction by reducing memory usage for Incorta schemas by not loading all columns into memory and evicting unused columns if memory is low.
  • Reduced loader service startup time. Only a subset of columns are loaded into memory, which reduces the loader service startup time.
  • Improved speed of incremental updates by appending existing data rather than rewriting it with each incremental load.

In previous releases, all data had to be loaded into memory to create direct data mapping files, which required large loader service memory.

The analytics service has now been enhanced to service queries using Parquet and direct data mapping files. The analytics service uses parquet with direct data mapping files to service queries. Tables with load filters and encryption are still loaded into memory to create direct data mapping files.

Loading can be slower if you have more data than memory because more data will have to be read from disk.

Incorta reads data columns directly from parquet using fast parquet reader ensures that only the incremental records are loaded. There is no need to evict the entire data column and reload. This helps minimize the time for incremental loads. Evict columns from loader service memory, after writing to snapshots, if memory is tight on loader service.

The following diagram illustrates the differences between the way Incorta used memory in previous releases and the way it uses memory in this release.

parquet

Additional memory changes in this release:

  • Analytics Service Column Warmup. Reading data columns directly from parquet reduces the time for dashboards and insight queries to refresh after an incremental load. However, after you restart the analytics service, dashboard queries load more slowly. To decrease the time to load dashboard queries after you restart the analytics service, you can choose to load and warmup specific columns first. You can choose one of the following column warmup strategies:

    • Business view columns: Load all columns referenced in business schema views in the Analytics service only
    • None: Don’t pre-load the columns. Only load on demand. None works best for small deployments with ad-hoc queries
    • Last used columns: Load the previous state prior to shut down in the loader and analytics services
    • All (Replaces “Eager Load”): Pre-load all columns into memory. All works best when you need to support ad-hoc queries, if there are no business schemas in place, and when the time between the analytics service startup and dashboard usage is significant.

Admin UI Move to CMC

In this release, the Admin UI was removed and all configuration options were moved to the Cluster Management Console (CMC). You can now upgrade cluster metadata from the CMC. Most options that you used in the Admin UI display are now in the CMC. For more details, see Admin UI to CMC Details.

Enhanced User Experience: Formula Builder

The formula builder user interface changed in this release. The following is a screenshot of the new layout:

formula builder 4 5

Analyze users can now:

  • See a physical table column and a business schema business view column at the same time.
  • Search to find an entity by data type.
  • Search to find a formula function or variable.
  • Search to find a schema, table, and column.
  • See the syntax and an example of a selected formula function.
  • Modify the formulas.
  • Add comments to steps in a large formula.
  • Double click to add a column.
  • Have syntax validated.

The following formula functions were added:

Function Type
between(exp value, exp min, exp max) Boolean
isNan(double value) Boolean
like(field exp, string pattern) Boolean
double(string exp) Conv
toChar(date exp, string format) Conv
rowNumber() Misc
exp(double exp) Arithmetic
sqrt(double exp) Arithmetic
trunc(double exp) Arithmetic
addHours(date exp, int hours) Date
addMilliseconds(date exp, int milliseconds) Date
addMonths(date exp, int months) Date
addSeconds(date exp, int seconds) Date
addYears(date exp, int years) Date
dateTrunc(date exp, string part) Date
formatDuration(int duration) Date
repeat(string value, int count) String
reverse(string value) String

Enhanced User Experience Changes: Schema, session variables, and scheduler pages

The schema listing page changed in this release. The following is a screenshot of the new layout:

schema page 4 5

New features include:

  • Last Load Type
  • Next Load Time
  • Last Modified By
  • Pagination
  • Data Size is based on Parquet and snapshot size, not memory size
  • Schema Description

The session variables page changed in this release. The following is a screenshot of the new layout.

session variable 4 5

New scheduler page for dashboards, schema loads, and data alerts changed in this release. The following are screenshots of the new layout.

scheduler 4 5a

schema load 4 5

Data Lake Support

Incorta now supports data lakes as a host for Incorta tenants and a data source. The following data lakes are supported:

  • ADLS Gen2
  • HDFS
  • AWS S3

You can use the following file types from the data lakes as data sources:

  • Parquet
  • ORC
  • CSV
  • Excel

For all data lakes you can:

  • Place all Incorta installer files, tenants, and objects on the data lake file share.
  • Read from the data lake using a pre-built connector.
  • Write the output of a materialized view back into the data lake file system.

You must perform some steps to use Hadoop with Spark on Windows. See here for more information on how to set up Spark on Windows so you can use it with Hadoop.

New Migration Tool

To take advantage of the memory enhancements, existing customers must run a migration tool to upgrade to Incorta 4.5 from previous versions of Incorta.

For more information on how to run the migration too, see Migration Tool.

Scalability

In previous releases, you could load up to 1.7 billion records for tables with a single key column. In this release, you can load up to 3.4 billion unique values per column. If you use composite keys, you can load up to 1 trillion records.

Open JDK and Oracle JDK 8 Support

You can now use Open JDK 11 and Oracle JDK 1.8+ with Incorta. To use Open JDK 11, set JAVAHOME and JREHOME to the Open JDK 11 main folder.

Connectors

The following new data sources and connectors are supported in this release of Incorta. You can now select them when you select a data source:

  • Sharepoint
  • Athena
  • Data lake connectors: Azure Gen2, HDFS, AWS S3, and local files.

JDBC Connection Properties

You can create a new connector with JDBC connection properties using property name and value pairs. When you select JDBC as a data source, you need to add field type, value, descriptions, and properties.

Data File Enhancements

Data Type Discovery: Improved the ability to discover date and timestamps from CSV source data and format them in Incorta.

Chunking: Support multi-thread extract and automatic chunking of large CSV files to enhance extraction in local or file share systems.

Simpler Tenant Administration

You can now list, export, and import the following in bulk from the command line using an asterisk:

  • Data sources
  • Session variables
  • Alerts from the command line

Fixed Issues

The following bug fixes and minor improvements were made in this release.

Component Release Note
Performance Enhanced the compaction logic for non-key columns.
Performance Fixed an issue where log files were large because they were flooded with KAFKA warning messages
Security Fixed an issue where a user signed into tenant A navigates to a tab, then switches to tenant B, but still sees their data (for which they have permissions) for tenant A.
Security Fixed an issue where user names with more than one space in the first and last name columns created an error on the Security tab.
Security Previously, Incorta Okta setup required a URL with a slash in the end. An end slash is no longer required.
CMC Fixed an issue where a 404 error displayed after a user logged in to CMC > Clusters > [Cluster Name] > Services > [Service_Name], showed the logs, then clicked back to the CMC home.
CLI Tools Added an asterisk (*) on the end of sample data files to provide users with a hint in exporting and importing session variables.
Data Sources & Data Files Fixed an error that prevented users from adding a multi-source table when a table from an SAP ERP Connector existed.
Installer Changed the version of Hadoop (now 3.2.0) that comes with Spark that is bundled with Incorta Analytics.
Schema Alias tables now show the number of records from the base table.
Schema Fixed an issue where editing a table, making a change, and then adding a join caused the change to disappear.
Schema Fixed an issue where modifying a query with a Kafka data source and a WHERE condition created an error.
Schema Fixed an issue where the number of columns in the table details were different from the number of columns in the schema view.
Schema Fixed an issue where the schema page spun for 30+ minutes.
Schema Prior to 4.5, you modified a materialized Incorta table by adding or deleting a column which removed and re-created the whole table. In Release 4.5, the materialized Incorta table is retained and only the modified columns are altered. This helps to better manage the lifecycle of an Incorta table.
Schema Table Aliases now refresh automatically when a base table changes.
Materialized Views Fixed an issue that caused an error to display when saving materialized view using a Python script.
Materialized Views Fixed an issue where materialized view updates were ignored when the SQL format included leading spaces, line breaks, and tabs.
Compaction Fixed an issue where old compacted versions were not deleted after running a new compaction.
Variables Fixed an issue where a case-sensitive session variable did not return the login user name as expected.
Dashboards & Insights Added support for the Yen in Incorta Analytics.
Dashboards & Insights Fixed an issue that caused an ArrayIndexOutOfBoundsException error on the dashboard.
Dashboards & Insights Fixed an issue where a 24 hour time format did not work.
Dashboards & Insights Fixed an issue where a user set the number of decimal points for data fields to 4, but only 2 decimals displayed on the dashboard.
Dashboards & Insights Fixed an issue where an incorrect date mask in a formula was allowed to save even though it was not able to run.
Dashboards & Insights Fixed an issue where Apple Maps did not render in the Analyzer mode until after a user clicked done to see the dashboard.
Dashboards & Insights Fixed an issue where color format (using the Format Color Palette option) didn’t work when the coloring dimension was identified in a formula.
Dashboards & Insights Fixed an issue where conditional formatting on a formula column is not displaying as expected.
Dashboards & Insights Fixed an issue where insight rows do not align when the fixed column feature is used.
Dashboards & Insights Fixed an issue where null values evaluation was not consistent when used in an arithmetic operation while aggregation was turned on.
Dashboards & Insights Fixed an issue where sorting a formula column by negative, null, or positive values did not work properly.
Dashboards & Insights Fixed an issue where the country code “CN” displayed “Cyprus No Mans Area” instead of “China.”
Dashboards & Insights Fixed an issue where the date and timezone for the same date displayed differently in the insight filter than in the insight.
Dashboards & Insights Fixed an issue where the error, “Required columns not loaded” displayed while rendering a dashboard.
Dashboards & Insights Fixed an issue where the parse date function did not work when a month was written in capital letters instead of title case.
Dashboards & Insights Fixed an issue where the SchemaRefreshTime formula did not update as expected after a full load.
Dashboards & Insights Fixed an issue where the Total columns were left-justified when transpose was enabled in a report, even though the content was centered while editing the report.
Dashboards & Insights Fixed an issue where users couldn’t see the field attributes modal when the field area was expanded on an insight with a large number of columns.
Dashboards & Insights Fixed an issue with formatting within a transposed insight.
Dashboards & Insights When data is being loaded as part of an incremental load, incorrect data was displayed on a dashboard for a short period of time at the end of a load cycle.
UI Added a help button to the UI of the Incorta Analytics application.
UI Added support for Japanese characters in schema and business view names.
UI Added support for Japanese in Incorta Analytics.

More Information and Implementation Notes

Migration Tool

Run the Migration Tool Using Linux

Perform the following steps to run the migration tool for Incorta Release 4.5 using Linux.

  1. Install Incorta Release 4.5.
  2. Update the Cluster metadata by selecting the Upgrade Cluster Metadata button when you first log into the CMC.
  3. Ensure the loader service and analytics service are not running.
  4. Ensure the metadata database is up and running.
  5. Run the tool migrateSnapshotsTool.sh in the IncortaNode directory on the machine where Incorta is installed.
  6. Choose whether to run for all or to enter the metadata database URL, username, password, tenants to migrate, schema names to migrate, file types to migrate, max threads (default 50% of available CPU), and max off-heap size (default 75% of machine memory).

Run the Migration Tool Using Windows

Perform the following steps to run the migration tool for Incorta Release 4.5 using Windows:

  1. Install Microsoft Visual C++ 2015 redistributable.
  2. Install Incorta Release 4.5.
  3. Update the Cluster metadata by selecting the Upgrade Cluster Metadata button when you first log into the CMC.
  4. Set the HADOOP_HOME environmental variable to <Incorta_installation_path>/IncortaNode/hadoop.
  5. Add to the PATH environmental variable: %HADOOP_HOME%/bin.
  6. Ensure the loader service and analytics service are not running.
  7. Ensure the metadata database is up and running.
  8. Run the tool migrateSnapshotsTool.bat in the IncortaNode directory on the machine where Incorta is installed.
  9. Choose whether to run for all or to enter the metadata database URL, username, password, tenants to migrate, schema names to migrate, file types to migrate, max threads (default 50% of available CPU), and max off-heap size (default 75% of machine memory).

What to Expect After Running the Migration Tool

When the migration tool completes running, it prints a result summary for each tenant you specified. If any failures occurred they will be listed with the results.

You must view the log file (named migrationTool.<timestamp>.log in the same directory as the Incorta Node) to see the results for individual tables.

The migration tool creates backup files of snapshots for each schema:

Original Snapshot Location Backup Snapshot Location
snapshots/<schema_name>.*.zxt snapshots/backup.<timestamp>/<schema_name>.all.zxt.zip
snapshots/<schema_name>.*.zxi snapshots/backup.<timestamp>/<schema_name>.all.zxi.zip
snapshots/<schema_name>.*.*.zxc snapshots/backup.<timestamp>/<schema_name>.all.zxc.zip

Migration Tool: Troubleshooting

Issue: If there is no existing .zxt file for a table, the table snapshot migration and index snapshot migration for the table will fail and you may see one of these errors:

ERROR: [15:30:50] [ebs_44 | EBS_AR_SNP | AR_AGING | Table Snapshot Migration] Unable to migrate table snapshot: EBS_AR_SNP.AR_AGING.zxt, because the file does not exist. Please load from staging. [com.incorta.engine.migration.MigrationTool45.migrateTableSnapshot]

ERROR: [15:30:50] [ebs_44 | EBS_AR_SNP | AR_AGING | Index Snapshot Migration] Unable to migrate index snapshot: EBS_AR_SNP.AR_AGING.zxi, because there is no corresponding table snapshot(.zxt). Please load from staging. [com.incorta.engine.migration.MigrationTool45.migrateIndexSnapshot]

Resolution: Load the table from staging.

Issue: If you see an error that says “Duplication join for …” like this:

ERROR: [15:30:04] [ebs_44 | EBS_AP] Duplicate join for EBS_AP.AP_BATCHES_ALL to EBS_FND_COMMON.FND_CURRENCIES [com.incorta.engine.metamodel.SchemaModel.addJoinDef]

Resolution: Ignore this error. It will not prevent migration.

Issue: After running the migration tool, you may see an error like this:

java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.2-502a238a-e939-492f-9564-5d773818cc15-libsnappyjava.so: /tmp/snappy-1.1.2-502a238a-e939-492f-9564-5d773818cc15-libsnappyjava.so: failed to map segment from shared object: Operation not permitted

Resolution: Run the Linux command sudo mount /tmp -o remount,exec to mount the needed directories. Then, re-run the migration tool.

Configure Spark to Work with Hadoop on Windows

If you plan to use Spark on Windows, you must perform the following steps to configure Spark. This allows you to use Spark with Hadoop on your Windows machine. If you use a Linux machine, you do not need to perform these or any additional steps to use Spark.

Requirements:

  • VC++2015
  • Spark without Hadoop
  • Hadoop 3.2

To configure external Spark to work with Hadoop on Windows:

  1. Install Incorta, but do not start the Cluster Management Console (CMC).
  2. Copy winutils.exe and hadoop.dll to the bin folder of Hadoop 3.2.
  3. Set the HADOOP_HOME environmental variable to the Hadoop 3.2 folder.
  4. Add to the PATH environmental variable: %HADOOP_HOME%/bin
  5. In the terminal, browse to the Hadoop 3.2 bin directory then run Hadoop classpath.
  6. Copy the classpath value to a text file.
  7. Run the hostname.
  8. Copy the hostname value to a text file. You will use it as the hostname.
  9. Copy the output to spark-env.sh which should look like this:
{{set SPARK_PUBLIC_DNS=(hostname) }}
{{set SPARK_MASTER_IP=(hostname) }}
{{set SPARK_MASTER_PORT=7077 }}
{{set SPARK_MASTER_WEBUI_PORT=9091 }}
{{set SPARK_WORKER_PORT=7078 }}
{{set SPARK_WORKER_WEBUI_PORT=9092 }}
{{set SPARK_WORKER_MEMORY=8g }}
set SPARK_DIST_CLASSPATH=(value of hadoop classpath copied as is)
  1. In the sbin folder of spark 2.4.3, create two cmd files:

    • Name: start-master.cmd. Content: ../bin/spark-class org.apache.spark.deploy.master.Master
    • Name: start-slave.cmd. Content: ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://(hostname):7077.
  2. Run start-master.cmd.
  3. Run start-slave.cmd.
  4. In the CMC, install the loader and analytics service.
  5. In Spark, select the external version and use spark://(hostname):7077 as the master.

Admin UI to CMC Details

In the Admin UI, server configuration options were under System Configuration > Server Configs. In the CMC, server configuration options are under Local > Cluster Configurations > Server Configurations.

In the Admin UI, default tenant configuration options were under System Configuration > Default Tenant Configs. In the CMC, default tenant configuration options are under Local > Cluster Configurations > Default Tenant Configurations.

In the Admin UI, individual tenant configurations were under Tenants. In the CMC, individual tenant configurations are available when you select Configure in the Configurations column when you view the list of tenants you set up.

The following options were removed from the Admin UI and are not available in the CMC:

  • Under Default Tenant Configurations > Advanced

    • Eager Load was removed and added to the new Warm Up Options configuration option.
  • Under Analytics / Loader Service

    • Removed ICC port because it is not used.

The following options from the Admin UI were changed in the CMC:

  • Under Default Tenant Configs > Security

    • Minimum Password Length default was changed to 5.
  • Analytics / Loader Service

    • Engine CPU Utilization (%) renamed to CPU Utilization (%) to represent all the utilization assigned to Incorta.

The following new options were added to server and tenant configurations in the CMC:

  • Under Server Configurations > Spark Integration

    • Spark App control channel port. The port used to send a shutdown signal to the Spark SQL app if the Incorta server needs to shut down Spark.
  • Under Default Tenant Configurations > Advanced

    • Warmup Mode allows you to select what data to pre-load into memory after a restart. The options are None (don’t pre-load any data into memory), Business View Cols (pre-load business view columns only), Last used columns (pre-load last used columns only), All (pre-load all data). If you do not set this, Business View Cols is the default.
  • Under Default Tenant Configurations > Incorta Labs

    • Wall-E, a new Incorta Assist beta feature, allows you to preview Incorta blueprints in your environment. See blueprints.incorta.com.
    • CLAIM Server URL, a new Incorta Assist feature.

Known Issues and Troubleshooting

The following are known issues in this release:

  • Load filter does not load data when load column is in Arabic. You cannot create a load filter in Arabic.
  • Creating a materialized view table with a non-English schema and table name causes an error. You can only create materialized views in SQL for schemas or tables with English words.
  • Replace Spark. You must replace Spark with the new version bundled with Incorta and reapply any custom configurations you set to upgrade to this release.

Troubleshooting: Old Connections Do Not Close As Expected, Cluster Won’t Restart

Issue: The number of connections increases when you open new connections without closing the old connections. This can cause the number of connections to reach the maximum number set for MySQL and prevent the cluster from starting.

Resolution: Upgrade the MySQL connector driver to 5.1.48 or later.


Related Links