Notebook Add-on

About Notebooks

By design, a notebook is an interactive environment that allows you to explore, manipulate, and transform data. This allows you to iteratively code and explore your data before saving code for export to the materialized view.

  • A notebook consists of one or more paragraphs.
  • A paragraph consists of a code section and a result section.
    • In the code section, you can use a language-specific editor to write either PySpark or SQL code. You can execute code in the code section using paragraph commands.
    • When there are executed results, you can view the output in the result section of the paragraph.

The Notebook Add-on service runs as an application in Apache Spark and manages the paragraph execution request. When running more than one paragraph, the Notebook Add-on service application processes each paragraph sequentially: when the first paragraph completes, the second is started.

Code Execution Language for a Notebook

In Incorta 4.6, a notebook-defined materialized view supports two interoperable languages, SQL and Python. This means that one paragraph can be in SQL and another Python.

When creating a materialized view, in the Data Source dialog, you must select a Language. The choices are SQL or Python.

  • SQL represents the execution of SQL using the Spark SQL library.
  • Python represents the execution of PySpark, which is the Python API for Spark.

Apache Spark executes all materialized views and natively runs Spark SQL queries using columnar data stored as Apache Parquet files in Shared Storage (Staging).

Edit Notebook Dialog

In the dialog title, the Edit Notebook dialog fulfills two functions:

  • It specifies the notebook language for export to the materialized view.
  • Contains the notebook layout, which consists of a toolbar bar and one or more interactive paragraphs.

Notebook Requirements

There are several requirements for implementing the Incorta Labs Notebook Integration:

  • Supported Linux Operating System
  • Apache Spark 2.4.3 must be running and properly configured for the Incorta Cluster instance.
  • An Incorta Cluster can only have a single Notebook Add-on.
  • The Incorta Node hosting the Notebook Add-on requires Python 3.6, or Python 3.7. Python 3.8 is not yet supported.
  • On the Incorta Node hosting the notebook, the default port 5500 must be open or the configured port must be open.

Notebook Integration Process

Before using the Notebook add-on, you must first integrate the Notebook into an Incorta Cluster. Notebook Integration requires the completion of several key tasks in the CMC:

  1. Create Notebook Add-on service.
  2. Set the Notebook integration properties in Server Configurations.
  3. Enable the Notebook Integration.
  4. Start the Notebook service.

Create Notebook Add-On

You can install the Notebook Add-on during a new installation or after installation.

There are two types of cluster installations:

  • Single Host is a standalone instance using the Typical installation method
  • Multi-host requires a Custom installation. Both cluster typologies are applicable to Incorta Notebooks.

To configure and install a Notebook Add-on during a Single Host (typical) Installation:

  1. In the Configuration Wizard, for Add-ons, specify the Notebook port value (the default 5500).
  2. Select Next to continue the configuration review.
  3. Select Create.

Here are the steps to configure and install a Notebook Add-on after a Single Host (Typical) or Multi-host (Custom) installation:

After a Single Host (Typical) Installation:

  1. In the navigation bar, select Nodes.
  2. In the nodes list, select the localNode.
  3. In the canvas, select the Add-ons tab.
  4. In the Add-ons header, select + (Add) to create a Notebook.
  5. In the Create a new notebook dialog, enter the Port number. The default value is 5500.
  6. Select Save.

After Multi-host (Custom) Installation:

  1. In the navigation bar, select Nodes.
  2. In the nodes list, select an Incorta Node.
  3. In the canvas, select the Add-ons tab.
  4. In the Add-ons header, select + (Add) to create a Notebook.
  5. In the Create a new notebook dialog, enter the Notebook Name and the Port number. The default value is 5500.
  6. Select Save.

Set Notebook Integration Properties

The Notebook Integrations settings are global to all tenants in a cluster configuration. For the selected Cluster, you can set default values for the folllowing settings:

  • Notebook Max Cores: Maximum amount of memory to use for all notebook executors.
  • Notebook Max Memory: Maximum amount of memory to use for all notebook executors, in the same format as JVM memory strings with a size unit suffix (“k”, “m”, “g” or “t”) (e.g. 512m, 2g).

To modify these settings and their default values:

  1. In the navigation bar, select Clusters.
  2. In the cluster list, select a Cluster name.
  3. In the canvas tabs, select Cluster Configurations.
  4. In the panel tabs, select Server Configurations.
  5. In the left pane, select Notebook Integration.
  6. Set the default value(s) for Notebook Max Cores and/or Notebook Max Memory.
  7. Select Save.

Enable the Notebook Integration

After Notebook Integration properties are set, then you can enable the Incorta Labs Notebook feature.

To enable Notebook Integration as default tenant configuration in the CMC:

  1. In the navigation bar, select Clusters.
  2. In the cluster list, select a Cluster name.
  3. In the canvas tabs, select Cluster Configurations.
  4. In the panel tabs, select Default Tenant Configurations.
  5. In the left pane, select Incorta Labs.
  6. In the right pane, toggle Notebook Integration to enable.
  7. Select Save.

To enable Notebook Integration for a specific tenant configuration in the CMC:

  1. In the navigation bar, select Clusters.
  2. In the cluster list, select a Cluster name.
  3. In the canvas tabs, select the Tenants tab.
  4. In the Tenant list, select Configure for the given Tenant.
  5. In the left pane, select Incorta Labs.
  6. In the right pane, toggle Notebook Integration to enable.
  7. Select Save.

Start, Stop, and Restart Notebook

To start, stop, and restart a Notebook:

  1. In the navigation bar, select Clusters.
  2. In the cluster list, select a Cluster name.
  3. In the canvas tabs, select Add-ons.
  4. In the nodes list, select the Notebook name.
  5. In Notebook details, select RestartStop, or Start.

Edit the Notebook Port

  1. In the navigation bar, select Clusters.
  2. In the cluster list, select a Cluster name.
  3. In the canvas tabs, select Add-ons.
  4. In the nodes list, select the Notebook name.
  5. In Notebook details, select Edit in the title.
  6. Change the Port value.
  7. Select Update.

After choosing a different Notebook port you must restart the Notebook for changes to take effect.