Concepts → Remote Table

Overview

A Remote table is a physical schema table definition in Incorta that references data stored in an external data lake, allowing access to that data without extracting it into Incorta’s shared storage or loading it into Engine memory. This architecture allows querying large datasets directly from their source locations, such as Google Cloud Storage and Amazon S3.

Remote tables in Incorta are primarily accessed through Materialized Views (MVs) and Spark SQL Views (supported starting 2026.3.0). The main use case for remote tables is to reference large external datasets and perform transformations or aggregations within Incorta, producing smaller, materialized datasets optimized for analytics.

Unlike standard physical tables, remote tables are not extracted or loaded by the Loader Service. Consequently, they are not directly available to the Analytics Service for querying. Instead, they must be referenced through MVs or Spark SQL Views, where the data is processed and materialized before being consumed by dashboards or external BI tools.

This approach helps reduce storage and memory usage as only the final transformed or aggregated results are loaded into Incorta Engine, rather than the full raw dataset.

Key benefits and use cases

  • Aggregating large data without full extraction: Remote tables enable referencing very large datasets and creating aggregated materialized views or Spark SQL views without loading the full raw data into Incorta.
  • Reducing storage duplication: Duplicating large datasets in shared storage can consume significant disk space and memory. Remote tables eliminate unnecessary duplication when only a subset or an aggregated dataset is needed.
  • Data lake–first architectures: Organizations maintaining centralized data lakes can share datasets with Incorta without disrupting existing storage architecture.
  • External BI tool integration: External BI tools can access remote data via Incorta’s SQL interfaces without physically loading it into Incorta.

Supported connectors

Incorta supports remote tables for the following data lake connectors:

How remote tables work

  1. Create a data lake data source with appropriate cloud storage credentials.

  2. Define a physical schema table in Incorta that references the data source, turn on the Remote toggle, and specify the remote data file type.

    Notes

    Before 2025.7, remote tables support text-based, ORC, and Parquet files only, while they start supporting Delta tables in 2025.7. They support union files starting with 2026.3.0.

  3. Do one of the following to access the remote data:

    • Create an MV that references the remote table, and load the MV to enable Analytics Service queries.
    • Create a Spark SQL view (available starting 2026.3.0) for quick insights without materializing data.
    • Use external BI tools, such as Power BI and Tableau, to query remote data directly via the standard SQL Interface or Advanced SQL Interface (supported starting 2026.3.0).

Remote tables vs. performance-optimized physical tables

AspectRemote tablesPerformance-optimized physical tables
Data LocationExternal storageIncorta memory
LoadingNo extractionFull extraction and loading
Analytics Service AccessVia MVs or Spark SQL views onlyDirect access
Memory UsageMinimalHigh (data in memory)
Query PerformanceDepends on the external system and SparkOptimized in-memory queries
Storage RequirementsNone in IncortaFull dataset stored
Use CaseLarge datasets, selective accessFrequently queried data, fast analytics

Known limitations

  • The Analytics Service cannot query remote tables directly. To use remote data in dashboards, insights, or business views, you must first create a materialized view or Spark SQL view that references the remote table
  • Standard incremental load capabilities don't apply to remote tables.
  • The performance of queries against remote tables (via MVs or Spark SQL views) depends on network latency and source system availability and performance.
  • Remote tables do not benefit from Incorta's in-memory processing and DDM processes unless materialized as MVs.