Tools → Data APIs for External Notebooks

About Data APIs for External Notebooks

Incorta Data APIs allow you to access data stored in Incorta, run queries on the data, and save data back to Incorta from the machine learning tools you prefer, including external notebooks such as Jupyter or Zeppelin. These RESTful APIs are accompanied by a Python library to allow you to seamlessly perform read, query, and save operations. This approach keeps the data within Incorta, which provides a higher level of performance.

Python Library Operations

Here are the Python library operations available in the data APIs for external notebooks:

API Input Output Example Description
IncortaAPI TENANT, USER_NAME, API_KEY IncortaAPI object incorta = IncortaAPI(tenant, user, api_key) Instantiates the IncortaAPI object
read TABLE_NAME Spark DataFrame df = incorta.read("SALES.SALES") Reads an Incorta table as a Spark data frame
read_pandas TABLE_NAME Pandas DataFrame df = incorta.read_pandas("SALES.SALES") Reads an Incorta table as a Pandas data frame
sql SPARK_SQL_QUERY Spark DataFrame df = incorta.sql("SELECT * FROM SALES.SALES") Executes the given query written in Spark SQL and returns the results as a Spark dataframe
sql_pg PG_SQL_QUERY Spark DataFrame df = incorta.sql_pg("SELECT * FROM BusinessView.MyView") Executes the given query written in PostgreSQL syntax and returns the results as a Spark data frame. This has the full capabilities of SQLi queries over the Spark port, so it can access business views and formula columns.
sql_pg_engine PG_SQL_QUERY, PASSWORD=None Spark DataFrame df = incorta.sql_pg_engine("SELECT * FROM SALES.SALES") Executes a query over the Incorta engine
sql_pg_nessy PG_SQL_QUERY, PASSWORD=None Spark DataFrame df = incorta.sql_pg_nessy("SELECT * FROM SALES.SALES", "incortapass") Executes a query over the Incorta SQLi engine
save_parquet DATAFRAME, DATA_FILE_NAME incorta.save_parquet(df, "myTable.parquet") Saves this Spark data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
save_parquet PANDAS_DATAFRAME, DATA_FILE_NAME incorta.save_parquet(df, "myTable.parquet") Saves this Pandas data frame as a Parquet file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
save_csv DATAFRAME, DATA_FILE_NAME, INDEX incorta.save_csv(df, "myTable.csv", index=False) Saves this Spark data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
save_csv PANDAS_DATAFRAME, DATA_FILE_NAME, INDEX incorta.save_csv(df, "myTable.csv", index=False) Saves this Pandas data frame as a CSV file in the Data directory of the current tenant. The file can be used as a source for any other tables in Incorta.
Note

The Data APIs currently support only local sessions, which means that the Spark driver must run in the same environment as the notebook with access to Incorta shared storage. The exceptions are the sql_pg_engine and sql_pg_nessy methods, which can be used in remote sessions.

Configure the Data APIs

Here are the steps to configure the data APIs:

  • Install the required Python dependencies.

    pip install pyspark==SPARK_VERSION
    pip install pandas
    pip install findspark
    pip install fastparquet, pyarrow    # required to save Pandas data frame as parquet
  • Install the python library.

    python -m easy_install IncortaAnalytics/IncortaNode/bin/data_apis/python/incorta_data_apis.egg
  • Create an API key.

    • Sign in to the Incorta Direct Data Platform™.
    • Select the user icon in the top right corner → SecurityGenerate API Key.
  • Create a configuration file with the following template located in IncortaAnalytics/IncortaNode/bin/data_apis/python/data-api.conf.template

    incorta.host = 	// Incorta host, e.g: localhost, xx.xx.xx.xx
    incorta.port =  // Analytics port, e.g: 8080
    spark.home =    // Path to spark home, e.g:
                       /home/incorta/IncortaAnalytics/IncortaNode/spark
    spark.master =  // Spark master URL
  • Initialize the library in your notebook or script.

    from incorta_data_apis import *
    incorta = IncortaAPI(TENANT_NAME, USER_NAME, API_KEY, PATH_TO_CONF_FILE)
Note

For a comprehensive example, refer to sample-notebook.ipynb in IncortaAnalytics/IncortaNode/bin/data_apis/python