Concepts → Multi-Source Table

A physical schema table may have more than one data source. A multi-source table allows a schema developer to union disparate datasets into one physical schema table. For example, a multi-source table may define one source as a File System dataset and another as a SQL Database dataset.

A multi-source table represents a parallelized workload. The parallelized extraction optimizes the load performance and simulates the parallelism afforded to connectors that support chunking.

When using multi-source tables, consider the following:

Column names are case-sensitive. Thus, columns will not be mapped unless they have the same exact name and letter case.
Mapped columns should have a common data type or can be successfully type cast. See Type Conflicts for more information.
When possible, define a column to indicate the source itself to help debug any issues with that source.
A key column enforces row uniqueness for the union of datasets.
If enabling an incremental load, you must enable the Incremental property for all the data sources in the multi-source table. In addition, if you specify the Maximum Value of a Column for the Incremental Extract Using property, the column you specify for the corresponding Incremental Column must have the same range for all data sources. If this is not the case, specify the Last Successful Extract Time.

Create a multi-source table

After you create a physical schema table you can then add additional data sources to the table.

Following are the steps to create a multi-source table:

From the Schema Manager, select the desired physical schema.
Select the desired table from the physical schema.
Select the large plus icon in the Data Sources section.
Complete the steps to add an additional data source to the table.

Manage multi-source output

Once a table has multiple data sources, the Manage Output option will be available next to the table name in the Table Editor. The Manage Output dialog will allow you to see the data source for each column and edit the column data type.

Data-source conflicts

If a conflict exists between the two data sources, Incorta will provide a warning in the Table Editor. Conflicts in the data sources must be rectified before the table will load successfully. Conflicts can arise from column function, data type, ordering, or naming.

Column merging by name

If columns from different data sources have the same exact name, Incorta will merge them into a single column for your multi-source table.

Ordering Conflicts

When creating a multi-source table, columns from source tables will be merged if they have the same names and are in the same column order. You will receive a warning if there is an ordering conflict between the tables and be prompted to correct the ordering conflict.

Function Conflicts

When two columns are matched by Incorta to merge, Incorta will automatically cast the column function if there is a conflict. You can use the Manage Output menu to customize the column function if needed.

The following table shows how Incorta will automatically resolve column function conflicts:

Column_1 in data_set_1	Column_1 in data_set_2	Casting
key	key	key
key	dimension	key
key	measure	key
dimension	dimension	dimension
dimension	measure	dimension
measure	measure	measure

Type Conflicts

When there is a conflict in the column type Incorta will automatically resolve most data type conflicts. There are several cases where you will be prompted to resolve the type conflict.

Incorta will automatically perform data type casting when needed or you can perform manual type casting. When you perform manual type casting, you need to ensure data can be cast into the new data type.

Starting with the 22.1.0 release, you can lock the data type of one or more output columns in a multi-source table. Locking the output column data types prevents Incorta from automatically casting them when validating the data sources.

The following table shows the automatic type casting that will occur and when the user will be prompted to manually select the column data type:

Column_1 in data_set_1	Column_1 in data_set_2	Casting
integer	integer	integer
integer	long	long
integer	double	double
integer	string	string
integer	null	integer
integer	timestamp	User will be prompted to resolve type conflict
integer	date	User will be prompted to resolve type conflict
integer	text	text
long	integer	long
long	long	long
long	double	double
long	string	string
long	null	long
long	timestamp	User will be prompted to resolve type conflict
long	date	User will be prompted to resolve type conflict
long	text	text
double	integer	double
double	long	double
double	double	double
double	string	string
double	null	double
double	timestamp	User will be prompted to resolve type conflict
double	date	User will be prompted to resolve type conflict
double	text	text
string	integer	string
string	long	string
string	double	string
string	string	string
string	null	string
string	timestamp	string
string	date	string
string	text	text
null	integer	integer
null	long	long
null	double	double
null	string	string
null	null	null
null	timestamp	timestamp
null	date	date
null	text	text
timestamp	integer	User will be prompted to resolve type conflict
timestamp	long	User will be prompted to resolve type conflict
timestamp	double	User will be prompted to resolve type conflict
timestamp	string	string
timestamp	null	timestamp
timestamp	timestamp	timestamp
timestamp	date	timestamp
timestamp	text	text
date	integer	User will be prompted to resolve type conflict
date	long	User will be prompted to resolve type conflict
date	double	User will be prompted to resolve type conflict
date	string	string
date	null	date
date	timestamp	timestamp
date	date	date
date	text	text
text	integer	text
text	long	text
text	double	text
text	string	text
text	null	text
text	timestamp	text
text	date	text
text	text	text

Content