Data Applications → Configure Incorta Data Delivery for Azure Synapse
About Incorta Data Delivery for Azure Synapse
Following is the high-level Incorta Data Delivery process for Microsoft Azure Synapse Analytics:
- Connect Incorta to your data source with a data application, and load pre-existing physical schema fact and dimension tables.
- Copy the physical schema tables to the Parquet that resides on Azure Data Lake Storage (ADLS) Gen 2.
- Load Synapse data marts from the parquet.
- Visualize the data with Power BI.
Get Started
Following are the steps to get started:
- Review prerequisites.
- Configure Your Azure Synapse Analytics SQL Pool
- Create an Azure Synapse Data Destination
- Set a Data Destination for a Physical Schema
When naming tenants or data destinations it is best practice to use a generic name. For example, "Incorta" rather than "IncortaDev". Doing so can be helpful when and if you need to import/export tenants or perform data destination integrations.
All tables with keys will use the Hash distribution option. By default, it uses the first key. All tables are indexed using Clustered columnstore indexes by default. Optional configurations for table level settings are available in the Optional table level settings section.
Configure Your Azure Synapse Analytics SQL Pool
Perform the following configuration steps in your Azure Synapse Analytics SQL Pool before you load data into Synapse from Incorta:
- Create the Incorta login for your SQL pool for the master database
- Create the Incorta user with grants for the SQL pool database
It is assumed that you already have a Synapse workspace and dedicated SQL Pool from the Azure blade.
Create the Incorta login for your SQL pool for the master database
Here are the steps to create an Incorta login for your SQL pool for the master database. You must replace <\PASSWORD> with a secure password.
- In Synapse Analytics, for your workspace, select New → SQL script.
- In the SQL editor, in the SQL Script tab, in Connect to, first select your dedicated SQL Pool and then, in Use database, select master.
- In the SQL editor, run the following CREATE LOGIN command, replacing <\PASSWORD> with a secure password for the new login.
CREATE LOGIN incorta WITH PASSWORD = '<PASSWORD>';
You will need to provide this username and password when you create an Azure Synapse Data Destination.
Create the Incorta user with grants for the SQL pool database
Here are the steps to create an Incorta user with grants for the selected SQL pool database:
- In Synapse Analytics, in Develop, select + → SQL script.
- In the SQL editor, in the SQL Script tab, in Connect to, first select your dedicated SQL Pool and then, in Use database, select your SQL pool database.
- In the SQLeditor canvas, run the following SQL commands:
CREATE USER incorta FOR LOGIN incorta;EXEC sp_addrolemember 'db_ddladmin', 'incorta';EXEC sp_addrolemember 'db_datawriter', 'incorta';EXEC sp_addrolemember 'db_datareader', 'incorta';GRANT ALTER ANY EXTERNAL FILE FORMAT to incorta;GRANT ADMINISTER DATABASE BULK OPERATIONS to incorta;
Create an Azure Synapse Data Destination
When you create an Azure Synapse Data Destination, you are able to seamlessly ingest physical schema tables from Incorta to your Azure Synapse instance.
Here are the steps:
- In the Navigation bar, select Data.
- In the Action bar, select + New → Add Data Destination.
- In the Choose a Data Destination dialog, in Schema Destination select Synapse Analytics.
- In the New Data Destination dialog, specify the following Azure Synapse schema destination properties:
Property | Control | Description |
---|---|---|
Name your data destination | text box | Enter your data destination name. |
SQL Pool Endpoint | text box | Enter the dedicated SQL endpoint. |
Database name | text box | Enter the SQL pool name. |
SQL Username | text box | Enter the username created in the Create the Incorta login for your SQL pool for the master database section above. |
SQL Password | text box | Enter the password created in the Create the Incorta login for your SQL pool for the master database section above. |
Port | text box | Enter the SQL port number. The default port value is 1433 . |
Connection Pool | text box | Enter the maximum number of concurrent tables being sent. The default value is 30. |
Authentication Type | dropdown list | Choose an authentication type from the following options: ● Managed Identity (default) ● Service Principal ● Storage Account Key For more information, refer to the Azure Synapse documentation. |
Connection Properties | text box | Optionally enter connector properties for a custom connection to Synapse Analytics in the format: propertyName=propertyValue , where each connector property is on a new line. For example: ● loginTimeout=30 ● ApplicationName = "Incorta" |
- To test the connection, select Test Connection.
- To save, select Ok.
For more information on how to manage Data Destinations, refer to Tools → Data Manager.
Set a Data Destination for a Physical Schema
Set a data destination for physical schemas that will push tables to Azure Synapse. Here are the steps:
- In the Navigation bar, select Schema.
- In the Action bar, select Settings (gear icon) → Set Data Destination.
- In the Set Schema Destination for <SCHEMA_NAME> dialog:
- Select a Schema Destination or select + New Schema Data Destination to create a new one.
- Optionally, enter Target Schema Name, which is the schema name you want to create in the data destination. The default value is schema name in Incorta.Note
If the schema name already exists in the destination, the newly sent schema will overwrite the existing one.
- Optionally, enter a value in the Target Table Prefix to add as a prefix to the table names.
- Enter Default String Length to set the maximum column string length, knowing that the maximum value is 4000.Warning
If you entered a default string length less than what your schema have, the destination will reject the sent schema.
- Select Done.
A Synapse Analytics icon will appear to the left of the schema name to indicate a data destination has been set for the schema.
Loading data
When you load your schema using full or incremental load, Incorta sends the schema to the data destination automatically. Hence, adding a new stage to the load job details viewer: Send to Destination. For more details, refer to the Load Job Details Viewer.
You can schedule load jobs that trigger data ingest into Synapse to run in a specific order. For more information, refer to Tools → Scheduler.
If you tried loading multiple schemas in one job and one of them has a data destination set, the stage Send to Destination will always appear.
Optional configurations
Throughout the configuration process there are several optional configurations you can add or modify from the default settings.
Optional table level settings
You can change some settings at the table level, which include the table hash distribution key or indexing type of columns.
If you want to change the key, you can add the following to the synapse_mappings.csv
file:
<SCHEMA_NAME>,<TABLE_NAME>,<HASH_COLUMN>,false
If you want to change the indexing type of a table to Heap, you can add the following to the synapse_mappings.csv
file:
<SCHEMA_NAME>,<TABLE_NAME>,,true
Limitations
Data Delivery limitations
- Encrypted columns are not supported and will be ingested into Synapse as a string, as it exists in Parquet.
Synapse limitations
- Hashing, the methodology used for deduplication and key definition, supports a single key only. Composite keys are not supported.
- A client error will be thrown if a date or timestamp column with a value less than 1.1.1970 is ingested.