Connectors → Data Files

About the Data Files connector

The Data Files connector allows you to connect to a local data file or a local data folder that has one or more local data files.

A local data file is a file that has been uploaded to a specific tenant in an Incorta Cluster. A local data folder is a folder that has been uploaded to a specific tenant in an Incorta Cluster. Using the Data Manager, you can upload one or more data files and folders to Shared Storage.

Note: LocalFiles differs from the Local Files connector

The Local Files connector is similar in name to the built-in LocalFiles data source. Unlike the Data Files connector, you can create an external data source using the Local Files connector. With the Local Files connector, you specify a host directory with files or subdirectories. For example, you can specify a shared mount as the directory path. In addition, the Local Files connector supports the Remote tables configuration and file types such as Parquet (.parquet) and Optimized Row Columnar (.orc). For this reason, the Local Files connector is in the category of Data Lake connectors.

The Data Files connector supports the following file extensions:

File ExtensionFile TypeExampleNotes
.csvcomma separated valuessales.csvCan contain a header row
.tsvtab separated valuessales.tsvCan contain a header row
.tabtab separated valuessales.tsvCan contain a header row
.txtcustom delimiter for separated valuessales.txtCan contain a header row
.xlsxMicrosoft Excel 2000 and abovesales.xlsxMust be .xlsx.
Supports Worksheet selection.
.kmlKeyhole Markup Language (KML), a tag-based structure with nested attributes based on the XML standardsales.kmlAll tags are case-sensitive.
For more information on using the KML file to render an insight, refer to the example of an Advanced Map insight using a KML file.
.kmzzipped version of a KML filesales.kmz
.zipzipped filesales.zip
.gzipGNU zipped filesales.gzip

You can access all folders and files that you own and any folders or files that someone shares with you.

The Data Files connector supports the following functionality:


FeatureSupported
Chunking
Data Agent
Encryption at Ingest
Incremental Load
Multi-Source
OAuth
Performance Optimized
Remote
Single-Source
Spark Extraction
Webhook Callbacks

Steps to use Data Files connector

Here are the high level steps, tools, and procedures to use the Data Files connector with the LocalFiles data source:

Upload one or more data files and folders, including subfolders and files

A folder can contain zero or more files with zero or more subfolders. Incorta preserves the hierarchy of folders. Incorta only uploads files with the following supported file extensions. After upload, Incorta will unzip compressed folders and files.

Here are the steps to create and one or more local data folders and/or local data files, including subfolders and files:

  • In the Navigation bar, select Data.
  • In the Action bar, select + NewAdd Data Source.
  • In the Choose a Data Source dialog, in Data Files, select Upload Data Folder.
  • In the Upload Data Folder dialog, in Upload Options, optionally select Overwrite existing file.
  • Drag and drop one or more files or parent folders to the Upload Data Folder dialog.
Note

The Upload Data Folder option and dialog enable you to upload both data files and folders. In case of uploading a duplicate files or folders, a warning message is displayed and you are prompted whether to cancel or overwrite existing files or folders.

Create a physical schema with the Schema Wizard

Here are the steps to create a Data Files physical schema with the Schema Wizard:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + NewSchema Wizard
  • In (1) Choose a Source, specify the following:
    • For Enter a name, enter the physical schema name.
    • For Select a Datasource, select LocalFiles.
    • Optionally create a description.
  • In the Schema Wizard footer, select Next.
  • In (2) Manage Tables, in the Data panel, navigate the directory tree as necessary to select your folder, file, or if an .xlsx file, select a worksheet.
  • In the Schema Wizard footer, select Next.
  • In (3) Finalize, in the Schema Wizard footer, select Create Schema.

Create a physical schema with the Schema Designer

Here are the steps to create a Data Files physical schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + New → Create Schema.
  • In Name, specify the physical schema name, and select Save.
  • In Tables tab, select +.
  • In the Table Data Source dialog, specify the Type as File System and Data Source as LocalFiles.
  • Specify various properties table data source properties.
  • Select Add.
  • In the Table Editor, in the Table section, enter the table name.
  • To save your changes, select Done in the Action bar.

Data Files table data source properties

You can specify a single file or folder in the Data Source dialog. Both the Schema Designer and Table Editor represent a single file and folder data source as a single-source table. In order to select a folder, you must enable Union Files.

Note

This release has limited support for Union Files for Excel (.xlsx) Workbook files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties.

Common properties for a local data file and a local data folder

Here are some of the common properties for both the selection a file and a folder:

PropertyControlDescription
Typedrop down listSelect type as File System
Data Sourcedrop down listSelect data source as LocalFiles
File Typedrop down listSelect the Text (.csv, .tsv, .tab, .txt), Excel (.xlsx), or Keyhole Markup Language (.kml)
Has Header?toggleSelect if first row contains column header values
CallbacktoggleEnables the Callback URL field
Callback URLtext boxThis property appears when the Callback toggle is enabled. Specify the URL.

Common local data file properties

Here are some of the common properties specifically related to selecting a file of either type Text (.csv, .tsv, .tab, .txt) or Excel (.xlsx):

PropertyControlDescription
IncrementaltoggleEnable to support incremental loading. For a single file, you must specify both a File and Update file.
Filebuttonelect a file to open the Add File from Local dialog. The dialog shows the files from your local data files and local data folders in shared storage. Select a single file and select Add.
Update FilebuttonWith Incremental enabled, Update File is available. Select a file to open the Add File from Local dialog. The dialog shows the files from your local data files and local data folders in shared storage. Select a single file and select Add.
Note

With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.

Properties for an Excel Workbook file

Here are the specific properties for an Excel Workbook (.xlsx) file:

PropertyControlDescription
Worksheetdrop down listSelect a given worksheet for the Excel Workbook

Properties for a Text file

Here are the properties specific to a Text (.csv, .tsv, .tab, .txt) file:

PropertyControlDescription
Date Formatdrop down listSelect a specific format for date columns. Date formats are Java date format conventions. With Automatic, Incorta will determine the format by sampling the first few rows.
Timestamp Formatdrop down listSelect a specific format for timestamp columns. Timestamp formats are Java data and time format conventions. With Automatic, Incorta will determine the format by sampling the first few rows.
Character Setdrop down listSelect a supported character set.
Separatordrop down listAvailable when the selected File Type is Text. Specify a separator for columns in the row values. Comma and Tab are standard delimiters. Other requires that you specify a value such as :.
Othertext boxAvailable when the Separator is Other. Enter one or more characters to specify the column separator or delimiter between values in a row.
Enable ChunkingtoggleEnable for large file sizes
Chunk Size (MB)text boxEnter a value in megabytes (MB) to specify the chunk size
Enable Spark Based Extraction (Deprecated)toggleEnable a Spark job to parallelize the data ingest.
This feature is no longer supported with plans to remove it in future connector versions.
Max Number of Parallel File Extractorstext boxEnter the a value for the number of Extractors which typically reflects up to the number of available cores.
Memory Per Extractortext boxEnter a value for memory in Gigabytes. This is typically the amount of dedicated memory divided by the number of available cores.
Common folder properties

Folder properties are available when you enable Union Files. It is not possible to select a parent folder.

Here are the properties specifically related to selecting a folder:

PropertyControlDescription
IncrementaltoggleEnable to support incremental loading
Union FilestoggleEnable to select all files within a given folder. When enabled, you will only be able to select a folder from LocalFiles.
DirectorybuttonSelect a folder from your LocalFiles. It is not possible to select a parent folder.
Includetext boxEnter a keyword with a wildcard * symbol to include specific named files within the folder
Excludetext boxEnter a keyword with a wildcard * symbol to exclude specific named files within the folder
Include Sub-Directories FilestoggleEnable to include files from sub-folders
Add Filename as a columntoggleEnable to add the filename of the file as a column. You will then need to specify a column name.
Filename columntext boxEnter a column name for the filename such as source_file_name
Note

With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.

Folder properties for Excel Workbook files
Important

This release has limited support for Union Files for Excel Workbook (.xlsx) files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties. For this reason, each Excel Workbook file in the selected folder must have a common Worksheet tab name. You must select this common Worksheet name in the drop down list.

Here are the properties specifically related to selecting a folder with a file type as Excel Workbook (.xlsx) files:

PropertyControlDescription
Worksheetdrop down listSelect a tab for a worksheet

View the schema diagram with the Schema Diagram Viewer

Here are the steps to view the schema diagram using the Schema Diagram Viewer:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the list of physical schemas, select the Data Files physical schema.
  • In the Schema Designer, in the Action bar, select Diagram.

Load the physical schema

Here are the steps to perform a Full Load of the Data Files physical schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the list of physical schemas, select the Data Files physical schema.
  • In the Schema Designer, in the Action bar, select LoadFull Load.
  • To review the load status, in Last Load Status, select the date.

Explore the physical schema

With the full load of the Data Files physical schema completed, you can use the Analyzer to explore the physical schema, create your first insight, and save the insight to a new dashboard.

To open the Analyzer from the physical schema, follow these steps:

  • In the Navigation bar, select Schema.
  • In the Schema Manager, in the List view, select the Data Files physical schema.
  • In the Schema Designer, in the Action bar, select Explore Data.