Concepts → External Tables

Overview

Incorta provides flexible architectures for accessing data stored in external data lakes, such as AWS S3 and Google Cloud Storage (GCS). In earlier releases, Incorta introduced remote tables, with data access relying on materialized views, and, more recently, Spark SQL Views. In 2026.3.0, Incorta evolved a no-copy read capability by introducing external tables, enabling Incorta to access tables in data lakes without re-extracting or duplicating data.

External tables are a new schema object type that enables reading and querying Delta Lake tables in Amazon S3, Azure Data Lake Storage (ADLS) Gen2, Google Cloud Storage, Apache Hadoop (HDFS), and Data Lake Local Files, without extracting, deduplicating, or storing any source data in Incorta. This eliminates redundant data copies while providing full analytical capabilities, including joins, formulas, and dashboard visualizations. Unlike remote tables that necessitate MV or view creation for Analytics Service access, external tables require only a load from staging.

Key benefits and use cases

Eliminate data duplication and reduce costs: Access Delta Lake data directly without creating copies in Incorta, reducing storage costs and eliminating synchronization overhead across analytics platforms.
Cross-cluster data sharing: Enable multiple Incorta clusters to access the same Delta Lake tables simultaneously, facilitating distributed analytics and collaborative operations.
Accelerated query performance: Leverage Incorta's Direct Data Mapping (DDM) files to optimize query execution on remote Delta Lake tables, treating them as optimized tables without extraction.
Lakehouse Integration: Organizations managing data in other lakehouse platforms can add Incorta analytics without re-engineering existing data pipelines or re-ingesting data.

How it works

Create a data lake data source with appropriate cloud storage credentials.
Define an external table. Incorta discovers Delta Lake tables and generates the necessary metadata.
Create joins and formula columns as needed.
Load the external table from staging to map to the latest Delta Lake version and generate the formula and join DDM files.
Build insights and business views with the same analytical capabilities as traditional tables while reading directly from the source.

External table metadata

The metadata of an external table includes the following:

Name
Datasets
Columns
Formula Columns
Joins
Filters (Runtime Security Filters)
Advanced Settings: Loaded Data in Memory

Name

An external table name must adhere to the name validation rules of all physical schema objects. Once you save the external table, you cannot change its name.

Dataset

An external table requires a single dataset that references a Delta Lake table. When defining a new external table, you specify the dataset properties as follows:

Type: Specify the data source type. The only available type is Data Lake.
Data Source: Select the data source from the list of Data Lake data sources you have access to.
Delta Lake Directory Path: Enter the path to the Delta Lake table directory, relative to the root directory configured in the data source.

Columns

An external table column represents the source column in the external data source, that is, the Delta Lake table. You cannot change the column name, data type, or encryption. You cannot set a key column in an external table or preview data.

Formula Columns

A formula column contains an expression that returns a scalar value of a specific data type. As such, Incorta computes and persists the formula column to shared storage in a Direct Data Mapping (DDM) file format.

Joins

You can create joins where an external table can be the child or parent table. During load jobs, Incorta calculates joins and saves them in DDM file formats.

Runtime security filters

You can apply one or more runtime security filters to restrict row access to the external table. Any dependent object, such as a runtime business view or dashboard insight, will automatically apply the runtime security filter.

Advanced Settings: Loaded Data in Memory

You can specify if you want to load the external table into the Engine memory or not.

Select All (Performance Optimized), the default setting, to load all data into the Engine memory, which allows querying data directly from the data lake table.
Select None (Disk-only) to not load any data into the Engine memory, providing the best memory usage. In such a case, queries against the external table can be done using MVs or other tools like Spark.

Known limitations

Incorta requires direct access to storage buckets containing the Delta Lake data. Catalog-based access is not yet supported.
For now, Incorta cannot query Delta tables created by Microsoft Fabric.
Source data updates are not auto-detected; schedule loads from staging to refresh mappings and DDM files.
The Delta Lake tables must be deduplicated since Incorta does not perform any deduplication.
Incremental reads are not yet supported.
Deletion vectors of Delta tables are not supported.

Note

You cannot configure column attributes in external tables, except for setting the column function to dimension or measure.

External tables vs. remote tables

Aspect	External tables	Remote tables
Purpose	Query Delta Lake tables directly from the data lake with full analytics capabilities.	Reference very large external datasets and aggregate them without loading the full dataset.
Availability	2026.3.0	Earlier releases
Supported Formats	Delta tables only	CSV, Parquet, ORC, Delta Lake (2025.7+), Union files (2026.3.0+)
Supported Connectors	AWS S3, Azure ADLS, Google Cloud Storage, Apache Hadoop (HDFS), and Data Lake Local Files	AWS S3, GCS, Azure Gen2, Hadoop, Hive, Athena, FTP, SFTP, Data Lake Local Files
Data Extraction	None	None
Load Operation	Load from staging required	No load operation
Data Storage in Incorta	No raw data copy; only metadata and DDM files for joins and formulas.	No raw data is stored; only results are materialized when using MVs.
Analytics Service Access	Direct access after load from staging	Via materialized views or Spark SQL views only
DDM Support	Yes, generated during load from staging	No
Join Support	Native, create joins as with standard tables	Via materialized views only
Formula Columns	Native, create formulas as with standard tables	Via materialized views or Spark SQL views only
Dashboard Integration	Direct, use in insights and dashboards	Via materialized views or Spark SQL views only
Performance Optimization	DDM files optimize query execution	Depends on Spark SQL or external engine
Memory Loading	Configurable (Performance-optimized or Disk-only)	Minimal, no data in Incorta
Cross-Cluster Sharing	Yes, multiple clusters access same tables	Limited, depends on external access
Data Refresh	Scheduled load from staging to update mappings	Always reads current data, no refresh needed
Primary Use Case	Delta Lake analytics without duplication	Large file aggregation and external BI access

Content