Tools → Data APIs for External Notebooks

About Data APIs for External Notebooks

Incorta Data APIs allow you to access data stored in Incorta, run queries on the data, and save data back to Incorta from the machine learning tools you prefer, including external notebooks such as Jupyter or Zeppelin. These RESTful APIs are accompanied by a Python library to allow you to seamlessly perform read, query, and save operations. This approach keeps the data within Incorta, which provides a higher level of performance.

Python Library Operations

Here are the Python library operations available in the data APIs for external notebooks:

API	Input	Output	Example	Description
`IncortaAPI`	TENANT, USER_NAME, API_KEY	IncortaAPI object	`incorta = IncortaAPI(tenant, user, api_key)`	Instantiates the IncortaAPI object
`read`	TABLE_NAME	Spark DataFrame	`df = incorta.read("SALES.SALES")`	Reads an Incorta table as a Spark data frame
`read_pandas`	TABLE_NAME	Pandas DataFrame	`df = incorta.read_pandas("SALES.SALES")`	Reads an Incorta table as a Pandas data frame
`sql`	SPARK_SQL_QUERY	Spark DataFrame	`df = incorta.sql("SELECT * FROM SALES.SALES")`	Executes the given query written in Spark SQL and returns the results as a Spark dataframe
`sql_pg`	PG_SQL_QUERY	Spark DataFrame	`df = incorta.sql_pg("SELECT * FROM BusinessView.MyView")`	Executes the given query written in PostgreSQL syntax and returns the results as a Spark data frame. This has the full capabilities of SQLi queries over the Spark port, so it can access business views and formula columns.
`sql_pg_engine`	PG_SQL_QUERY, PASSWORD=None	Spark DataFrame	`df = incorta.sql_pg_engine("SELECT * FROM SALES.SALES")`	Executes a query over the Incorta engine
`sql_pg_nessy`	PG_SQL_QUERY, PASSWORD=None	Spark DataFrame	`df = incorta.sql_pg_nessy("SELECT * FROM SALES.SALES", "incortapass")`	Executes a query over the Incorta SQLi engine
`save_parquet`	DATAFRAME, DATA_FILE_NAME		`incorta.save_parquet(df, "myTable.parquet")`	Saves this Spark data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
`save_parquet`	PANDAS_DATAFRAME, DATA_FILE_NAME		`incorta.save_parquet(df, "myTable.parquet")`	Saves this Pandas data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
`save_csv`	DATAFRAME, DATA_FILE_NAME, INDEX		`incorta.save_csv(df, "myTable.csv", index=False)`	Saves this Spark data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
`save_csv`	PANDAS_DATAFRAME, DATA_FILE_NAME, INDEX		`incorta.save_csv(df, "myTable.csv", index=False)`	Saves this Pandas data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.

Note

The Data APIs currently support only local sessions, which means that the Spark driver must run in the same environment as the notebook with access to Incorta shared storage. The exceptions are the sql_pg_engine and sql_pg_nessy methods, which can be used in remote sessions.

Configure the Data APIs

Here are the steps to configure the data APIs:

Install the required Python dependencies.

pip install pyspark==SPARK_VERSION
pip install pandas
pip install findspark
pip install fastparquet, pyarrow    # required to save Pandas data frame as parquet

Install the python library.

pip install IncortaAnalytics/IncortaNode/bin/data_apis/python/incorta_data_apis-1.0-py3-none-any.whl

Create an API key.
- Sign in to the Incorta Direct Data Platform™.
- Select the user icon in the top right corner → Security → Generate API Key.

Create a configuration file with the following template located in IncortaAnalytics/IncortaNode/bin/data_apis/python/data-api.conf.template

incorta.host =  // Incorta host, e.g: localhost, xx.xx.xx.xx
incorta.port =  // Analytics port, e.g: 8080
spark.home =    // Path to spark home, e.g:
                   /home/incorta/IncortaAnalytics/IncortaNode/spark
spark.master =  // Spark master URL

Initialize the library in your notebook or script.

from incorta_data_apis import *
incorta = IncortaAPI(TENANT_NAME, USER_NAME, API_KEY, PATH_TO_CONF_FILE)

Note

For a comprehensive example, refer to sample-notebook.ipynb in IncortaAnalytics/IncortaNode/bin/data_apis/python

Content

About Data APIs for External Notebooks

Python Library Operations

Configure the Data APIs