Tools → Data APIs for External Notebooks
Incorta Data APIs allow you to access data stored in Incorta, run queries on the data, and save data back to Incorta from the machine learning tools you prefer, including external notebooks such as Jupyter or Zeppelin. These RESTful APIs are accompanied by a Python library to allow you to seamlessly perform read, query, and save operations. This approach keeps the data within Incorta, which provides a higher level of performance.
Here are the Python library operations available in the data APIs for external notebooks:
|TENANT, USER_NAME, API_KEY||IncortaAPI object||Instantiates the IncortaAPI object|
|TABLE_NAME||Spark DataFrame||Reads an Incorta table as a Spark data frame|
|TABLE_NAME||Pandas DataFrame||Reads an Incorta table as a Pandas data frame|
|SPARK_SQL_QUERY||Spark DataFrame||Executes the given query written in Spark SQL and returns the results as a Spark dataframe|
|PG_SQL_QUERY||Spark DataFrame||Executes the given query written in PostgreSQL syntax and returns the results as a Spark data frame. This has the full capabilities of SQLi queries over the Spark port, so it can access business views and formula columns.|
|PG_SQL_QUERY, PASSWORD=None||Spark DataFrame||Executes a query over the Incorta engine|
|PG_SQL_QUERY, PASSWORD=None||Spark DataFrame||Executes a query over the Incorta SQLi engine|
|DATAFRAME, DATA_FILE_NAME||Saves this Spark data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.|
|PANDAS_DATAFRAME, DATA_FILE_NAME||Saves this Pandas data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.|
|DATAFRAME, DATA_FILE_NAME, INDEX||Saves this Spark data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.|
|PANDAS_DATAFRAME, DATA_FILE_NAME, INDEX||Saves this Pandas data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.|
The Data APIs currently support only local sessions, which means that the Spark driver must run in the same environment as the notebook with access to Incorta shared storage. The exceptions are the
sql_pg_nessy methods, which can be used in remote sessions.
Here are the steps to configure the data APIs:
Install the required Python dependencies.pip install pyspark==SPARK_VERSIONpip install pandaspip install findsparkpip install fastparquet, pyarrow # required to save Pandas data frame as parquet
Install the python library.pip install IncortaAnalytics/IncortaNode/bin/data_apis/python/incorta_data_apis-1.0-py3-none-any.whl
Create an API key.
- Sign in to the Incorta Direct Data Platform™.
- Select the user icon in the top right corner → Security → Generate API Key.
Create a configuration file with the following template located in
IncortaAnalytics/IncortaNode/bin/data_apis/python/data-api.conf.templateincorta.host = // Incorta host, e.g: localhost, xx.xx.xx.xxincorta.port = // Analytics port, e.g: 8080spark.home = // Path to spark home, e.g:/home/incorta/IncortaAnalytics/IncortaNode/sparkspark.master = // Spark master URL
Initialize the library in your notebook or script.from incorta_data_apis import *incorta = IncortaAPI(TENANT_NAME, USER_NAME, API_KEY, PATH_TO_CONF_FILE)
For a comprehensive example, refer to