Tools → Data APIs for External Notebooks

About Data APIs for External Notebooks

Incorta Data APIs allow you to access data stored in Incorta, run queries on the data, and save data back to Incorta from the machine learning tools you prefer, including external notebooks such as Jupyter or Zeppelin. These RESTful APIs are accompanied by a Python library to allow you to seamlessly perform read, query, and save operations. This approach keeps the data within Incorta, which provides a higher level of performance.

Python Library Operations

Here are the Python library operations available in the data APIs for external notebooks:

APIInputOutputExampleDescription
IncortaAPITENANT, USER_NAME, API_KEYIncortaAPI objectincorta = IncortaAPI(tenant, user, api_key)Instantiates the IncortaAPI object
readTABLE_NAMESpark DataFramedf = incorta.read("SALES.SALES")Reads an Incorta table as a Spark data frame
read_pandasTABLE_NAMEPandas DataFramedf = incorta.read_pandas("SALES.SALES")Reads an Incorta table as a Pandas data frame
sqlSPARK_SQL_QUERYSpark DataFramedf = incorta.sql("SELECT * FROM SALES.SALES")Executes the given query written in Spark SQL and returns the results as a Spark dataframe
sql_pgPG_SQL_QUERYSpark DataFramedf = incorta.sql_pg("SELECT * FROM BusinessView.MyView")Executes the given query written in PostgreSQL syntax and returns the results as a Spark data frame. This has the full capabilities of SQLi queries over the Spark port, so it can access business views and formula columns.
sql_pg_enginePG_SQL_QUERY, PASSWORD=NoneSpark DataFramedf = incorta.sql_pg_engine("SELECT * FROM SALES.SALES")Executes a query over the Incorta engine
sql_pg_nessyPG_SQL_QUERY, PASSWORD=NoneSpark DataFramedf = incorta.sql_pg_nessy("SELECT * FROM SALES.SALES", "incortapass")Executes a query over the Incorta SQLi engine
save_parquetDATAFRAME, DATA_FILE_NAMEincorta.save_parquet(df, "myTable.parquet")Saves this Spark data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
save_parquetPANDAS_DATAFRAME, DATA_FILE_NAMEincorta.save_parquet(df, "myTable.parquet")Saves this Pandas data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
save_csvDATAFRAME, DATA_FILE_NAME, INDEXincorta.save_csv(df, "myTable.csv", index=False)Saves this Spark data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
save_csvPANDAS_DATAFRAME, DATA_FILE_NAME, INDEXincorta.save_csv(df, "myTable.csv", index=False)Saves this Pandas data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
Note

The Data APIs currently support only local sessions, which means that the Spark driver must run in the same environment as the notebook with access to Incorta shared storage. The exceptions are the sql_pg_engine and sql_pg_nessy methods, which can be used in remote sessions.

Configure the Data APIs

Here are the steps to configure the data APIs:

  • Install the required Python dependencies.

    pip install pyspark==SPARK_VERSION
    pip install pandas
    pip install findspark
    pip install fastparquet, pyarrow # required to save Pandas data frame as parquet
  • Install the python library.

    pip install IncortaAnalytics/IncortaNode/bin/data_apis/python/incorta_data_apis-1.0-py3-none-any.whl
  • Create an API key.

    • Sign in to the Incorta Direct Data Platform™.
    • Select the user icon in the top right corner → SecurityGenerate API Key.
  • Create a configuration file with the following template located in IncortaAnalytics/IncortaNode/bin/data_apis/python/data-api.conf.template

    incorta.host = // Incorta host, e.g: localhost, xx.xx.xx.xx
    incorta.port = // Analytics port, e.g: 8080
    spark.home = // Path to spark home, e.g:
    /home/incorta/IncortaAnalytics/IncortaNode/spark
    spark.master = // Spark master URL
  • Initialize the library in your notebook or script.

    from incorta_data_apis import *
    incorta = IncortaAPI(TENANT_NAME, USER_NAME, API_KEY, PATH_TO_CONF_FILE)
Note

For a comprehensive example, refer to sample-notebook.ipynb in IncortaAnalytics/IncortaNode/bin/data_apis/python