Concepts → Multi-Source Table
A physical schema table may have more than one data source. A multi-source table allows a schema developer to union disparate datasets into one physical schema table. For example, a multi-source table may define one source as a File System
dataset and another as a SQL Database
dataset.
A multi-source table represents a parallelized workload. The parallelized extraction optimizes the load performance and simulates the parallelism afforded to connectors that support chunking.
When using multi-source tables, consider the following:
Column names are case-sensitive. Thus, columns will not be mapped unless they have the same exact name and letter case.
Mapped columns should have a common data type or can be successfully type cast. See Type Conflicts for more information.
When possible, define a column to indicate the source itself to help debug any issues with that source.
A key column enforces row uniqueness for the union of datasets.
If enabling an incremental load, you must enable the Incremental property for all the data sources in the multi-source table. In addition, if you specify the
Maximum Value of a Column
for theIncremental Extract Using
property, the column you specify for the correspondingIncremental Column
must have the same range for all data sources. If this is not the case, specify theLast Successful Extract Time
.
Create a multi-source table
After you create a physical schema table you can then add additional data sources to the table.
Following are the steps to create a multi-source table:
- From the Schema Manager, select the desired physical schema.
- Select the desired table from the physical schema.
- Select the large plus icon in the Data Sources section.
- Complete the steps to add an additional data source to the table.
Manage multi-source output
Once a table has multiple data sources, the Manage Output option will be available next to the table name in the Table Editor. The Manage Output dialog will allow you to see the data source for each column and edit the column data type.
Data-source conflicts
If a conflict exists between the two data sources, Incorta will provide a warning in the Table Editor. Conflicts in the data sources must be rectified before the table will load successfully. Conflicts can arise from column function, data type, ordering, or naming.
Column merging by name
If columns from different data sources have the same exact name, Incorta will merge them into a single column for your multi-source table.
Ordering Conflicts
When creating a multi-source table, columns from source tables will be merged if they have the same names and are in the same column order. You will receive a warning if there is an ordering conflict between the tables and be prompted to correct the ordering conflict.
Function Conflicts
When two columns are matched by Incorta to merge, Incorta will automatically cast the column function if there is a conflict. You can use the Manage Output menu to customize the column function if needed.
The following table shows how Incorta will automatically resolve column function conflicts:
Column_1 in data_set_1 | Column_1 in data_set_2 | Casting |
---|---|---|
key | key | key |
key | dimension | key |
key | measure | key |
dimension | dimension | dimension |
dimension | measure | dimension |
measure | measure | measure |
Type Conflicts
When there is a conflict in the column type Incorta will automatically resolve most data type conflicts. There are several cases where you will be prompted to resolve the type conflict.
Incorta will automatically perform data type casting when needed or you can perform manual type casting. When you perform manual type casting, you need to ensure data can be cast into the new data type.
Starting with the 22.1.0 release, you can lock the data type of one or more output columns in a multi-source table. Locking the output column data types prevents Incorta from automatically casting them when validating the data sources.
The following table shows the automatic type casting that will occur and when the user will be prompted to manually select the column data type:
Column_1 in data_set_1 | Column_1 in data_set_2 | Casting |
---|---|---|
integer | integer | integer |
integer | long | long |
integer | double | double |
integer | string | string |
integer | null | integer |
integer | timestamp | User will be prompted to resolve type conflict |
integer | date | User will be prompted to resolve type conflict |
integer | text | text |
long | integer | long |
long | long | long |
long | double | double |
long | string | string |
long | null | long |
long | timestamp | User will be prompted to resolve type conflict |
long | date | User will be prompted to resolve type conflict |
long | text | text |
double | integer | double |
double | long | double |
double | double | double |
double | string | string |
double | null | double |
double | timestamp | User will be prompted to resolve type conflict |
double | date | User will be prompted to resolve type conflict |
double | text | text |
string | integer | string |
string | long | string |
string | double | string |
string | string | string |
string | null | string |
string | timestamp | string |
string | date | string |
string | text | text |
null | integer | integer |
null | long | long |
null | double | double |
null | string | string |
null | null | null |
null | timestamp | timestamp |
null | date | date |
null | text | text |
timestamp | integer | User will be prompted to resolve type conflict |
timestamp | long | User will be prompted to resolve type conflict |
timestamp | double | User will be prompted to resolve type conflict |
timestamp | string | string |
timestamp | null | timestamp |
timestamp | timestamp | timestamp |
timestamp | date | timestamp |
timestamp | text | text |
date | integer | User will be prompted to resolve type conflict |
date | long | User will be prompted to resolve type conflict |
date | double | User will be prompted to resolve type conflict |
date | string | string |
date | null | date |
date | timestamp | timestamp |
date | date | date |
date | text | text |
text | integer | text |
text | long | text |
text | double | text |
text | string | text |
text | null | text |
text | timestamp | text |
text | date | text |
text | text | text |