Concepts → Multi-Source Table

A physical schema table may have more than one data source. A multi-source table allows a schema developer to union disparate datasets into one physical schema table. For example, a multi-source table may define one source as a File System dataset and another as a SQL Database dataset.

A multi-source table represents a parallelized workload. The parallelized extraction optimizes the load performance and simulates the parallelism afforded to connectors that support chunking.

When using multi-source tables, consider the following:

  • Column names are case-sensitive. Thus, columns will not be mapped unless they have the same exact name and letter case.

  • Mapped columns should have a common data type or can be successfully type cast. See Type Conflicts for more information.

  • When possible, define a column to indicate the source itself to help debug any issues with that source.

  • A key column enforces row uniqueness for the union of datasets.

  • If enabling an incremental load, you must enable the Incremental property for all the data sources in the multi-source table. In addition, if you specify the Maximum Value of a Column for the Incremental Extract Using property, the column you specify for the corresponding Incremental Column must have the same range for all data sources. If this is not the case, specify the Last Successful Extract Time.

Create a multi-source table

After you create a physical schema table you can then add additional data sources to the table.

Following are the steps to create a multi-source table:

  • From the Schema Manager, select the desired physical schema.
  • Select the desired table from the physical schema.
  • Select the large plus icon in the Data Sources section.
  • Complete the steps to add an additional data source to the table.

Manage multi-source output

Once a table has multiple data sources, the Manage Output option will be available next to the table name in the Table Editor. The Manage Output dialog will allow you to see the data source for each column and edit the column data type.

Data-source conflicts

If a conflict exists between the two data sources, Incorta will provide a warning in the Table Editor. Conflicts in the data sources must be rectified before the table will load successfully. Conflicts can arise from column function, data type, ordering, or naming.

Column merging by name

If columns from different data sources have the same exact name, Incorta will merge them into a single column for your multi-source table.

Ordering Conflicts

When creating a multi-source table, columns from source tables will be merged if they have the same names and are in the same column order. You will receive a warning if there is an ordering conflict between the tables and be prompted to correct the ordering conflict.

Function Conflicts

When two columns are matched by Incorta to merge, Incorta will automatically cast the column function if there is a conflict. You can use the Manage Output menu to customize the column function if needed.

The following table shows how Incorta will automatically resolve column function conflicts:

Column_1 in data_set_1Column_1 in data_set_2Casting
keykeykey
keydimensionkey
keymeasurekey
dimensiondimensiondimension
dimensionmeasuredimension
measuremeasuremeasure

Type Conflicts

When there is a conflict in the column type Incorta will automatically resolve most data type conflicts. There are several cases where you will be prompted to resolve the type conflict.

Incorta will automatically perform data type casting when needed or you can perform manual type casting. When you perform manual type casting, you need to ensure data can be cast into the new data type.

Starting with the 22.1.0 release, you can lock the data type of one or more output columns in a multi-source table. Locking the output column data types prevents Incorta from automatically casting them when validating the data sources.

The following table shows the automatic type casting that will occur and when the user will be prompted to manually select the column data type:

Column_1 in data_set_1Column_1 in data_set_2Casting
integerintegerinteger
integerlonglong
integerdoubledouble
integerstringstring
integernullinteger
integertimestampUser will be prompted to resolve type conflict
integerdateUser will be prompted to resolve type conflict
integertexttext
longintegerlong
longlonglong
longdoubledouble
longstringstring
longnulllong
longtimestampUser will be prompted to resolve type conflict
longdateUser will be prompted to resolve type conflict
longtexttext
doubleintegerdouble
doublelongdouble
doubledoubledouble
doublestringstring
doublenulldouble
doubletimestampUser will be prompted to resolve type conflict
doubledateUser will be prompted to resolve type conflict
doubletexttext
stringintegerstring
stringlongstring
stringdoublestring
stringstringstring
stringnullstring
stringtimestampstring
stringdatestring
stringtexttext
nullintegerinteger
nulllonglong
nulldoubledouble
nullstringstring
nullnullnull
nulltimestamptimestamp
nulldatedate
nulltexttext
timestampintegerUser will be prompted to resolve type conflict
timestamplongUser will be prompted to resolve type conflict
timestampdoubleUser will be prompted to resolve type conflict
timestampstringstring
timestampnulltimestamp
timestamptimestamptimestamp
timestampdatetimestamp
timestamptexttext
dateintegerUser will be prompted to resolve type conflict
datelongUser will be prompted to resolve type conflict
datedoubleUser will be prompted to resolve type conflict
datestringstring
datenulldate
datetimestamptimestamp
datedatedate
datetexttext
textintegertext
textlongtext
textdoubletext
textstringtext
textnulltext
texttimestamptext
textdatetext
texttexttext