Tools → Versioning Migration Tool
About the Versioning Migration Tool
Starting release 5.1, there is a new directory structure for shared storage. A load job for a physical schema creates file versions in this new directory structure. There are file versions for Direct Data Mapping files and Apache Parquet files.
After upgrading to release 5.2, you have two options for how to implement this new directory structure for shared storage:
- Run the Versioning Migration Tool immediately after upgrade
- Perform a full load for every physical schema in a given tenant
The Versioning Migration Tool is a command-line tool that you run offline after upgrading the cluster metadata database, but before restarting the Analytics and Loader services. You can use the tool to migrate single or multiple tenants, physical schemas, or physical schema tables, Incorta SQL tables, Incorta Analyzer tables, and materialized views. You can run the tool more than once without affecting migrated entities.
The tool can be either a shell script file versioningMigrationTool.sh
or a batch file versioningMigrationTool.bat
depending on the installation environment: Linux or Windows respectively.
Versioning Migration Tool access rights and context
You use the Versioning Migration Tool after upgrading from an older release of Incorta Direct Data Platform to release 5.1. To run this tool and use it to migrate to the new directory structure, you must use a terminal application to access the Incorta host machine with a System Administrator that has root access to it. You must run it after upgrading the cluster metadata database and before starting the Incorta services..
You can upgrade the cluster metadata database and start the services using the Cluster Management Console (CMC).
If you have one or more tenants hosted on a virtual file system, such as Azure Data Lake Storage (ADLS), Hadoop Distributed File System (HDFS), and Google Cloud Storage (GCS), you must prepare for the Versioning Migration Tool to allow it to migrate the contents of these tenants.
For more information, see Considerations for tenants hosted on virtual file systems.
Versioning Migration Tool Running Modes
You can run the Versioning Migration Tool in either interactive mode or unattended mode.
Interactive mode
In this mode, provide the tool with all the arguments required to migrate the cluster content, such as the Incorta Metadata database connection string, the specific tenants, the specific physical schemas, and, if required, the backup path. You can run the tool in interactive mode either to migrate the cluster content or to create a .properties
file that you can use later to run the tool in unattended mode.
To use the Versioning Migration Tool to migrate to the new directory structure while running it in interactive mode, follow these steps:
- In the terminal of the host for the Incorta Node that runs the Loader Service or Analytics Services, navigate to the installation path as the
incorta
user:
Linux OS example:
sudo su incortacd /home/incorta/IncortaAnalytics/IncortaNode`
- For Linux operating systems, use the following command to run the tool:
./versioningMigrationTool.sh
- For Windows operating systems, use the following command to run the tool:
versioningMigrationTool.bat
- When prompted, enter the required information. See Versioning Migration Tool Parameters for more details.
- When prompted, press Enter to start the migration process.
- Wait for the Versioning Migration Tool to complete the backup and migration processes, as applicable.
To use the tool to create the parameters .properties
file, after providing all the required information and when prompted, enter S
, and then press Enter
to save the parameters you provided to a .properties
file and exit the tool. The path of the resulting file is /home/IncortaAnalytics/IncortaNode/
and the file name follows this naming convention: migration.<date>-<timestamp>.properties
, for example, migration.20210526-152237.properties
.
Unattended mode
In this mode, when you run the tool, you need to provide the .properties
file that contains the parameters or arguments required to migrate the cluster directory structure. You can create this file using either a text editor or the tool itself while running it in interactive mode. When the tool runs in unattended mode, it reads the required information from the .properties
file.
The following is an example of the .properties
file:
dbURL=jdbc:mysql://127.0.0.1:3306/incorta_metadatadbUser=userdbPassword=1234tenants=demoschemas=HRtablePattern=backupDate=truebackupPath=maxThreads=10
For more information about the parameters to include in the .properties
file, review Versioning Migration Tool Parameters.
To use the Versioning Migration Tool to migrate to the new directory structure while running it in unattended mode, follow these steps:
- In the terminal for the Incorta node host, whether the Loader Node or the Analytics Node, navigate to the installation path as the
incorta
user:
sudo su incortacd /home/incorta/IncortaAnalytics/IncortaNode
Run the Versioning Migration Tool and provide the path and name of the parameters
.properties
file.For Linux operating systems, use the following command to run the tool:
./versioningMigrationTool.sh /<File_Path>/<File_Name>
.Example:
./versioningMigrationTool.sh /home/IncortaAnalytics/IncortaNode/migration.20210526-152237.properties
- For Windows operating systems, use the following command to run the tool:
versioningMigrationTool.bat /<File_Path>/<File_Name>
.
Example:
versioningMigrationTool.bat /home/IncortaAnalytics/IncortaNode/migration.20210526-152237.properties
- Wait till the Versioning Migration Tool completes the backup and migration processes as applicable.
Versioning Migration Tool Parameters
The following table shows the parameters that should be available in the .properties
file or that you have to provide when prompted when running the tool in interactive mode:
Parameter | Description | Example |
---|---|---|
dbURL | Enter the database connection string in a suitable format depending on the database management system | jdbc:mysql://127.0.0.1:3306/incorta_metadata |
dbUser | Enter the database user | |
dbPassword | Enter the database user password | |
tenants | Enter the tenants that you want to migrate their files separated by a space or a comma. Leave blank to migrate all tenants. | demo,foundations,casestudy |
schemas | Enter the schemas that you want to migrate their files separated by a space or a comma. Leave blank to migrate all schemas. | SALES,HR |
tablePattern | Enter the fully qualified name (schemaName.objectName) of the physical schema object that you want to migrate its files. Leave blank to migrate all objects. You can use a regular expression to specify multiple physical schema objects. | SALES.Products .*\.emp.* to include all objects with “emp” in their names in the selected tenants and physical schemas |
backupDate | Enter true or false to specify if you want the tool to create a backup for the original tenant data (snapshot and parquet directories) before altering the directory structure | true or false |
backupPath | Enter the backup directory path. This can be a directory on the host machine or a shared drive on a cloud service that Incorta has access to. Leave blank to create the backup file for each tenant under its directory. | |
maxThreads | Enter the maximum number of worker threads that the tool can use during the migration process. Leave blank so that the tool automatically calculates the value. |
During the migration process, the tool must have access to the Incorta metadata database. The tool queries the database to determine the tenants, related physical schemas, and related entity objects. It also inserts rows into two new tables required for file versioning: FILES_VERSIONS and VERSION_LOCK. The Analytics and Loader Services read from these tables to determine which files to load into memory.
If you provide a value for the maxThreads
property that exceeds the available number of worker threads on the host machine, the tool will use all the available worker threads.
Additional Considerations
Disk space and backup considerations
Before using the Versioning Migration Tool, you must ensure that there is adequate disk space in shared storage.
If you create a backup prior to the migration, you need to account for the backup size as part of your disk space calculations. The default backup directory is the tenant directory. You can specify a different directory .
The backup file will contain the "parquet" and "snapshots" directories. The tool backs up the files for all physical schemas and tables in these directories, not only the ones selected for migration.
To determine the size of the existing files in shared storage, Parquet files, for example, run the following Linux bash shell commands:
cd ~/IncortaAnalytics/Tenants/du -sh */parquet | sort -hr
You can create a backup of each tenant’s data yourself instead of instructing the tool to create this backup. You can restore the old directory structure, if needed, and run the Versioning Migration Tool to start over the migration process.
Depending on the size of the shared storage files, the backup process may take a significant amount of time
Considerations for tenants hosted on virtual file systems
If you have one or more tenants hosted on a virtual file system (VFS), such as ADLS, HDFS, or GCS, you must prepare for the Versioning Migration Tool to allow it to migrate the contents of these tenants.
To prepare for the migration of tenants hosted on a virtual file system, follow these steps that are applicable to all supported virtual file systems:
- Copy the
core-site.xml
file to the following locations:<installation_path>/cmc/lib/
<installation_path>cmc/tmt
<installation_path>/IncortaNode/runtime/lib/
<installation_path>/IncortaNode/runtime/webapps/incorta/WEB-INF/lib/
<installation_path>/IncortaNode/hadoop/etc/hadoop/
You can use the cp
command to perform this task.
Example: cp /incorta/core-site.xml /home/incorta/IncortaAnalytics/cmc/lib/
- In the case of ADLS only, set the environment variables in
~/.bash_profile
or~/.bashrc
as follows:export AZURE_CLIENT_ID=<your_Azure_Client_ID>
export AZURE_CLIENT_SECRET_KEY=<your_Azure_Client_Secret_Key>
export AZURE_TENANT_ID=<your_Azure_Tenant_ID>
- For all supported virtual file systems, inject the
core-site.xml
file inincorta.engine.tools.jar
:- Navigate to the following directory:
<installation_path>/IncortaNode/runtime/webapps/incorta/WEB-INF/lib
. The default installation path is/home/incorta/IncortaAnalytics
- Run the following command:
jar uf incorta.engine.tools.jar core-site.xml
- Navigate to the following directory:
After a successful migration
When the Versioning Migration Tool completes the migration process, the migration result summary appears showing migrated, failed, and skipped entities. You can run the tool more than once to migrate failed entities.
Old structure content
After completing the migration successfully and starting the services, it is recommended that you delete unneeded directories and files including the following:
- The tenant backup files
- The snapshot directory
- The parquet directory
- The
loadTime.log
file
The Versioning Migration Tool log files
By default, the Versioning Migration Tool creates log files in the <installation_path>/IncortaNode
and <installation_path>/IncortaNode/migration
directories. You can use the log files to determine the results of the migration process.
The tool logging configurations are available in the migration-logging.properties
file that exists in the IncortaNode
directory. You can change the default configurations by editing this .properties
file.
While log files created under the IncortaNode
directory start with versioningMigrationTool
as a prefix followed by a timestamp.
Log files created under the migration
directory start with the prefix specified in the migration-logging.properties
file and followed by a timestamp also. The default prefix is incorta-migration
.
Considerations for migrating files between different environments
When migrating shared storage files from one Incorta cluster to another, for example, from User Acceptance Testing (UAT) to Production, you must first copy the parquet (source) folder and then perform a load from staging. Both environments must run an Incorta release that supports file versioning and the copied files should not have records in the FILES_VERSIONS or VERSION_LOCK metadata database tables.
Only copying the ddm
and source
folders from shared storage between the different environments will not have the same result as copying the source
folder and then loading data from staging.
The default maxThread value calculation
The Versioning Migration Tool calculates the default maxThread
value based on the following equation:
CPU cores utilization * machine available processors)/100
The CPU cores utilization
initial value is the engine.cpu_cores_utilization
value that exists in the <installation_path>/IncortaNode/services/<service_directory>/incorta/engine.properties
file. However, the tool can use another derived value as follows:
- If the tool does not find the 'engine.cpu_cores_utilization' value, the default is 50.
- If the value is less than 10, the tool will consider it 10.
- If the value is greater than 100, the tool will consider it 100.
If the result of the default maxThread
calculation is less than 1, the default value will be 1.