Concepts → External Tables
Overview
Incorta provides flexible architectures for accessing data stored in external data lakes, such as AWS S3 and Google Cloud Storage (GCS). In earlier releases, Incorta introduced remote tables, with data access relying on materialized views, and, more recently, Spark SQL Views. In 2026.3.0, Incorta evolved a no-copy read capability by introducing external tables, enabling Incorta to access tables in data lakes without re-extracting or duplicating data.
External tables are a new schema object type that enables reading and querying Delta Lake tables in Amazon S3, Azure Data Lake Storage (ADLS) Gen2, Google Cloud Storage, Apache Hadoop (HDFS), and Data Lake Local Files, without extracting, deduplicating, or storing any source data in Incorta. This eliminates redundant data copies while providing full analytical capabilities, including joins, formulas, and dashboard visualizations. Unlike remote tables that necessitate MV or view creation for Analytics Service access, external tables require only a load from staging.
Key benefits and use cases
- Eliminate data duplication and reduce costs: Access Delta Lake data directly without creating copies in Incorta, reducing storage costs and eliminating synchronization overhead across analytics platforms.
- Cross-cluster data sharing: Enable multiple Incorta clusters to access the same Delta Lake tables simultaneously, facilitating distributed analytics and collaborative operations.
- Accelerated query performance: Leverage Incorta's Direct Data Mapping (DDM) files to optimize query execution on remote Delta Lake tables, treating them as optimized tables without extraction.
- Lakehouse Integration: Organizations managing data in other lakehouse platforms can add Incorta analytics without re-engineering existing data pipelines or re-ingesting data.
How it works
- Create a data lake data source with appropriate cloud storage credentials.
- Define an external table. Incorta discovers Delta Lake tables and generates the necessary metadata.
- Create joins and formula columns as needed.
- Load the external table from staging to map to the latest Delta Lake version and generate the formula and join DDM files.
- Build insights and business views with the same analytical capabilities as traditional tables while reading directly from the source.
External table metadata
The metadata of an external table includes the following:
- Name
- Datasets
- Columns
- Formula Columns
- Joins
- Filters (Runtime Security Filters)
- Advanced Settings: Loaded Data in Memory
Name
An external table name must adhere to the name validation rules of all physical schema objects. Once you save the external table, you cannot change its name.
Dataset
An external table requires a single dataset that references a Delta Lake table. When defining a new external table, you specify the dataset properties as follows:
- Type: Specify the data source type. The only available type is Data Lake.
- Data Source: Select the data source from the list of Data Lake data sources you have access to.
- Delta Lake Directory Path: Enter the path to the Delta Lake table directory, relative to the root directory configured in the data source.
Columns
An external table column represents the source column in the external data source, that is, the Delta Lake table. You cannot change the column name, data type, or encryption. You cannot set a key column in an external table or preview data.
Formula Columns
A formula column contains an expression that returns a scalar value of a specific data type. As such, Incorta computes and persists the formula column to shared storage in a Direct Data Mapping (DDM) file format.
Joins
You can create joins where an external table can be the child or parent table. During load jobs, Incorta calculates joins and saves them in DDM file formats.
Runtime security filters
You can apply one or more runtime security filters to restrict row access to the external table. Any dependent object, such as a runtime business view or dashboard insight, will automatically apply the runtime security filter.
Advanced Settings: Loaded Data in Memory
You can specify if you want to load the external table into the Engine memory or not.
- Select All (Performance Optimized), the default setting, to load all data into the Engine memory, which allows querying data directly from the data lake table.
- Select None (Disk-only) to not load any data into the Engine memory, providing the best memory usage. In such a case, queries against the external table can be done using MVs or other tools like Spark.
Known limitations
- Incorta requires direct access to storage buckets containing the Delta Lake data. Catalog-based access is not yet supported.
- For now, Incorta cannot query Delta tables created by Microsoft Fabric.
- Source data updates are not auto-detected; schedule loads from staging to refresh mappings and DDM files.
- The Delta Lake tables must be deduplicated since Incorta does not perform any deduplication.
- Incremental reads are not yet supported.
- Deletion vectors of Delta tables are not supported.
You cannot configure column attributes in external tables, except for setting the column function to dimension or measure.
External tables vs. remote tables
| Aspect | External tables | Remote tables |
|---|---|---|
| Purpose | Query Delta Lake tables directly from the data lake with full analytics capabilities. | Reference very large external datasets and aggregate them without loading the full dataset. |
| Availability | 2026.3.0 | Earlier releases |
| Supported Formats | Delta tables only | CSV, Parquet, ORC, Delta Lake (2025.7+), Union files (2026.3.0+) |
| Supported Connectors | AWS S3, Azure ADLS, Google Cloud Storage, Apache Hadoop (HDFS), and Data Lake Local Files | AWS S3, GCS, Azure Gen2, Hadoop, Hive, Athena, FTP, SFTP, Data Lake Local Files |
| Data Extraction | None | None |
| Load Operation | Load from staging required | No load operation |
| Data Storage in Incorta | No raw data copy; only metadata and DDM files for joins and formulas. | No raw data is stored; only results are materialized when using MVs. |
| Analytics Service Access | Direct access after load from staging | Via materialized views or Spark SQL views only |
| DDM Support | Yes, generated during load from staging | No |
| Join Support | Native, create joins as with standard tables | Via materialized views only |
| Formula Columns | Native, create formulas as with standard tables | Via materialized views or Spark SQL views only |
| Dashboard Integration | Direct, use in insights and dashboards | Via materialized views or Spark SQL views only |
| Performance Optimization | DDM files optimize query execution | Depends on Spark SQL or external engine |
| Memory Loading | Configurable (Performance-optimized or Disk-only) | Minimal, no data in Incorta |
| Cross-Cluster Sharing | Yes, multiple clusters access same tables | Limited, depends on external access |
| Data Refresh | Scheduled load from staging to update mappings | Always reads current data, no refresh needed |
| Primary Use Case | Delta Lake analytics without duplication | Large file aggregation and external BI access |