Tools → Data APIs for External Notebooks
About Data APIs for External Notebooks
Incorta Data APIs allow you to access data stored in Incorta, run queries on the data, and save data back to Incorta from the machine learning tools you prefer, including external notebooks such as Jupyter or Zeppelin. These RESTful APIs are accompanied by a Python library to allow you to seamlessly perform read, query, and save operations. This approach keeps the data within Incorta, which provides a higher level of performance.
Python Library Operations
Here are the Python library operations available in the data APIs for external notebooks:
API | Input | Output | Example | Description |
---|---|---|---|---|
IncortaAPI | TENANT, USER_NAME, API_KEY | IncortaAPI object | incorta = IncortaAPI(tenant, user, api_key) | Instantiates the IncortaAPI object |
read | TABLE_NAME | Spark DataFrame | df = incorta.read("SALES.SALES") | Reads an Incorta table as a Spark data frame |
read_pandas | TABLE_NAME | Pandas DataFrame | df = incorta.read_pandas("SALES.SALES") | Reads an Incorta table as a Pandas data frame |
sql | SPARK_SQL_QUERY | Spark DataFrame | df = incorta.sql("SELECT * FROM SALES.SALES") | Executes the given query written in Spark SQL and returns the results as a Spark dataframe |
sql_pg | PG_SQL_QUERY | Spark DataFrame | df = incorta.sql_pg("SELECT * FROM BusinessView.MyView") | Executes the given query written in PostgreSQL syntax and returns the results as a Spark data frame. This has the full capabilities of SQLi queries over the Spark port, so it can access business views and formula columns. |
sql_pg_engine | PG_SQL_QUERY, PASSWORD=None | Spark DataFrame | df = incorta.sql_pg_engine("SELECT * FROM SALES.SALES") | Executes a query over the Incorta engine |
sql_pg_nessy | PG_SQL_QUERY, PASSWORD=None | Spark DataFrame | df = incorta.sql_pg_nessy("SELECT * FROM SALES.SALES", "incortapass") | Executes a query over the Incorta SQLi engine |
save_parquet | DATAFRAME, DATA_FILE_NAME | incorta.save_parquet(df, "myTable.parquet") | Saves this Spark data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta. | |
save_parquet | PANDAS_DATAFRAME, DATA_FILE_NAME | incorta.save_parquet(df, "myTable.parquet") | Saves this Pandas data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta. | |
save_csv | DATAFRAME, DATA_FILE_NAME, INDEX | incorta.save_csv(df, "myTable.csv", index=False) | Saves this Spark data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta. | |
save_csv | PANDAS_DATAFRAME, DATA_FILE_NAME, INDEX | incorta.save_csv(df, "myTable.csv", index=False) | Saves this Pandas data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta. |
The Data APIs currently support only local sessions, which means that the Spark driver must run in the same environment as the notebook with access to Incorta shared storage. The exceptions are the sql_pg_engine
and sql_pg_nessy
methods, which can be used in remote sessions.
Configure the Data APIs
Here are the steps to configure the data APIs:
Install the required Python dependencies.
pip install pyspark==SPARK_VERSIONpip install pandaspip install findsparkpip install fastparquet, pyarrow # required to save Pandas data frame as parquetInstall the python library.
pip install IncortaAnalytics/IncortaNode/bin/data_apis/python/incorta_data_apis-1.0-py3-none-any.whlCreate an API key.
- Sign in to the Incorta Direct Data Platform™.
- Select the user icon in the top right corner → Security → Generate API Key.
Create a configuration file with the following template located in
IncortaAnalytics/IncortaNode/bin/data_apis/python/data-api.conf.template
incorta.host = // Incorta host, e.g: localhost, xx.xx.xx.xxincorta.port = // Analytics port, e.g: 8080spark.home = // Path to spark home, e.g:/home/incorta/IncortaAnalytics/IncortaNode/sparkspark.master = // Spark master URLInitialize the library in your notebook or script.
from incorta_data_apis import *incorta = IncortaAPI(TENANT_NAME, USER_NAME, API_KEY, PATH_TO_CONF_FILE)
For a comprehensive example, refer to sample-notebook.ipynb
in IncortaAnalytics/IncortaNode/bin/data_apis/python