Concepts → Remote Table
Overview
A Remote table is a physical schema table definition in Incorta that references data stored in an external data lake, allowing access to that data without extracting it into Incorta’s shared storage or loading it into Engine memory. This architecture allows querying large datasets directly from their source locations, such as Google Cloud Storage and Amazon S3.
Remote tables in Incorta are primarily accessed through Materialized Views (MVs) and Spark SQL Views (supported starting 2026.3.0). The main use case for remote tables is to reference large external datasets and perform transformations or aggregations within Incorta, producing smaller, materialized datasets optimized for analytics.
Unlike standard physical tables, remote tables are not extracted or loaded by the Loader Service. Consequently, they are not directly available to the Analytics Service for querying. Instead, they must be referenced through MVs or Spark SQL Views, where the data is processed and materialized before being consumed by dashboards or external BI tools.
This approach helps reduce storage and memory usage as only the final transformed or aggregated results are loaded into Incorta Engine, rather than the full raw dataset.
Key benefits and use cases
- Aggregating large data without full extraction: Remote tables enable referencing very large datasets and creating aggregated materialized views or Spark SQL views without loading the full raw data into Incorta.
- Reducing storage duplication: Duplicating large datasets in shared storage can consume significant disk space and memory. Remote tables eliminate unnecessary duplication when only a subset or an aggregated dataset is needed.
- Data lake–first architectures: Organizations maintaining centralized data lakes can share datasets with Incorta without disrupting existing storage architecture.
- External BI tool integration: External BI tools can access remote data via Incorta’s SQL interfaces without physically loading it into Incorta.
Supported connectors
Incorta supports remote tables for the following data lake connectors:
- Apache Hadoop (HDFS)
- Apache Hive
- AWS Athena
- AWS S3
- Google Cloud Storage (GCS)
- Microsoft Azure Gen2
- Local Files (Data Lake)
- FTP
- SFTP
How remote tables work
Create a data lake data source with appropriate cloud storage credentials.
Define a physical schema table in Incorta that references the data source, turn on the Remote toggle, and specify the remote data file type.
NotesBefore 2025.7, remote tables support text-based, ORC, and Parquet files only, while they start supporting Delta tables in 2025.7. They support union files starting with 2026.3.0.
Do one of the following to access the remote data:
- Create an MV that references the remote table, and load the MV to enable Analytics Service queries.
- Create a Spark SQL view (available starting 2026.3.0) for quick insights without materializing data.
- Use external BI tools, such as Power BI and Tableau, to query remote data directly via the standard SQL Interface or Advanced SQL Interface (supported starting 2026.3.0).
Remote tables vs. performance-optimized physical tables
| Aspect | Remote tables | Performance-optimized physical tables |
|---|---|---|
| Data Location | External storage | Incorta memory |
| Loading | No extraction | Full extraction and loading |
| Analytics Service Access | Via MVs or Spark SQL views only | Direct access |
| Memory Usage | Minimal | High (data in memory) |
| Query Performance | Depends on the external system and Spark | Optimized in-memory queries |
| Storage Requirements | None in Incorta | Full dataset stored |
| Use Case | Large datasets, selective access | Frequently queried data, fast analytics |
Known limitations
- The Analytics Service cannot query remote tables directly. To use remote data in dashboards, insights, or business views, you must first create a materialized view or Spark SQL view that references the remote table
- Standard incremental load capabilities don't apply to remote tables.
- The performance of queries against remote tables (via MVs or Spark SQL views) depends on network latency and source system availability and performance.
- Remote tables do not benefit from Incorta's in-memory processing and DDM processes unless materialized as MVs.