Concepts → External Tables

Overview

Incorta provides flexible architectures for accessing data stored in external data lakes, such as AWS S3 and Google Cloud Storage (GCS). In earlier releases, Incorta introduced remote tables, with data access relying on materialized views, and, more recently, Spark SQL Views. In 2026.3.0, Incorta evolved a no-copy read capability by introducing external tables, enabling Incorta to access tables in data lakes without re-extracting or duplicating data.

External tables are a new schema object type that enables reading and querying Delta Lake tables in Amazon S3, Azure Data Lake Storage (ADLS) Gen2, Google Cloud Storage, Apache Hadoop (HDFS), and Data Lake Local Files, without extracting, deduplicating, or storing any source data in Incorta. This eliminates redundant data copies while providing full analytical capabilities, including joins, formulas, and dashboard visualizations. Unlike remote tables that necessitate MV or view creation for Analytics Service access, external tables require only a load from staging.

Key benefits and use cases

  • Eliminate data duplication and reduce costs: Access Delta Lake data directly without creating copies in Incorta, reducing storage costs and eliminating synchronization overhead across analytics platforms.
  • Cross-cluster data sharing: Enable multiple Incorta clusters to access the same Delta Lake tables simultaneously, facilitating distributed analytics and collaborative operations.
  • Accelerated query performance: Leverage Incorta's Direct Data Mapping (DDM) files to optimize query execution on remote Delta Lake tables, treating them as optimized tables without extraction.
  • Lakehouse Integration: Organizations managing data in other lakehouse platforms can add Incorta analytics without re-engineering existing data pipelines or re-ingesting data.

How it works

  1. Create a data lake data source with appropriate cloud storage credentials.
  2. Define an external table. Incorta discovers Delta Lake tables and generates the necessary metadata.
  3. Create joins and formula columns as needed.
  4. Load the external table from staging to map to the latest Delta Lake version and generate the formula and join DDM files.
  5. Build insights and business views with the same analytical capabilities as traditional tables while reading directly from the source.

External table metadata

The metadata of an external table includes the following:

  • Name
  • Datasets
  • Columns
  • Formula Columns
  • Joins
  • Filters (Runtime Security Filters)
  • Advanced Settings: Loaded Data in Memory

Name

An external table name must adhere to the name validation rules of all physical schema objects. Once you save the external table, you cannot change its name.

Dataset

An external table requires a single dataset that references a Delta Lake table. When defining a new external table, you specify the dataset properties as follows:

  • Type: Specify the data source type. The only available type is Data Lake.
  • Data Source: Select the data source from the list of Data Lake data sources you have access to.
  • Delta Lake Directory Path: Enter the path to the Delta Lake table directory, relative to the root directory configured in the data source.

Columns

An external table column represents the source column in the external data source, that is, the Delta Lake table. You cannot change the column name, data type, or encryption. You cannot set a key column in an external table or preview data.

Formula Columns

A formula column contains an expression that returns a scalar value of a specific data type. As such, Incorta computes and persists the formula column to shared storage in a Direct Data Mapping (DDM) file format.

Joins

You can create joins where an external table can be the child or parent table. During load jobs, Incorta calculates joins and saves them in DDM file formats.

Runtime security filters

You can apply one or more runtime security filters to restrict row access to the external table. Any dependent object, such as a runtime business view or dashboard insight, will automatically apply the runtime security filter.

Advanced Settings: Loaded Data in Memory

You can specify if you want to load the external table into the Engine memory or not.

  • Select All (Performance Optimized), the default setting, to load all data into the Engine memory, which allows querying data directly from the data lake table.
  • Select None (Disk-only) to not load any data into the Engine memory, providing the best memory usage. In such a case, queries against the external table can be done using MVs or other tools like Spark.

Known limitations

  • Incorta requires direct access to storage buckets containing the Delta Lake data. Catalog-based access is not yet supported.
  • For now, Incorta cannot query Delta tables created by Microsoft Fabric.
  • Source data updates are not auto-detected; schedule loads from staging to refresh mappings and DDM files.
  • The Delta Lake tables must be deduplicated since Incorta does not perform any deduplication.
  • Incremental reads are not yet supported.
  • Deletion vectors of Delta tables are not supported.
Note

You cannot configure column attributes in external tables, except for setting the column function to dimension or measure.

External tables vs. remote tables

AspectExternal tablesRemote tables
PurposeQuery Delta Lake tables directly from the data lake with full analytics capabilities.Reference very large external datasets and aggregate them without loading the full dataset.
Availability2026.3.0Earlier releases
Supported FormatsDelta tables onlyCSV, Parquet, ORC, Delta Lake (2025.7+), Union files (2026.3.0+)
Supported ConnectorsAWS S3, Azure ADLS, Google Cloud Storage, Apache Hadoop (HDFS), and Data Lake Local FilesAWS S3, GCS, Azure Gen2, Hadoop, Hive, Athena, FTP, SFTP, Data Lake Local Files
Data ExtractionNoneNone
Load OperationLoad from staging requiredNo load operation
Data Storage in IncortaNo raw data copy; only metadata and DDM files for joins and formulas.No raw data is stored; only results are materialized when using MVs.
Analytics Service AccessDirect access after load from stagingVia materialized views or Spark SQL views only
DDM SupportYes, generated during load from stagingNo
Join SupportNative, create joins as with standard tablesVia materialized views only
Formula ColumnsNative, create formulas as with standard tablesVia materialized views or Spark SQL views only
Dashboard IntegrationDirect, use in insights and dashboardsVia materialized views or Spark SQL views only
Performance OptimizationDDM files optimize query executionDepends on Spark SQL or external engine
Memory LoadingConfigurable (Performance-optimized or Disk-only)Minimal, no data in Incorta
Cross-Cluster SharingYes, multiple clusters access same tablesLimited, depends on external access
Data RefreshScheduled load from staging to update mappingsAlways reads current data, no refresh needed
Primary Use CaseDelta Lake analytics without duplicationLarge file aggregation and external BI access