Configure Spark for Use with Materialized Views

These are the default settings inherited by all Materialized Views (MVs), both Spark SQL and PySpark version. With this in mind, it is recommended to configure the default settings to run most of your MVs, and you can also use the override capability on an MV-per-MV basis for those requiring more resources (cores, memory, driver) than average. This helps ensure smaller MVs aren’t overallocated with resources.

Default Materialized View Application settings

For a selected Cluster, you can set the Materialized Views default values for Apache Spark Integrations using the following settings:

  • Materialized view application cores
  • Materialized view application memory
  • Materialized view application executors
Note

The Spark Integrations settings are global to all tenants in a cluster configuration.

Materialized view application cores

The number of CPU cores reserved for use by materialized view. The default value is 1. The allocated cores for all running Spark applications cannot exceed the dedicated cores for the cluster.  

Materialized view application memory

The number of gigabytes of maximum memory to use for materialized view. The default is 1 GB. The memory for all Spark applications combined cannot exceed the cluster memory (in gigabytes).

Materialized view application executors

Maximum number of executors that can be spawned by a single materialized view application. Each of the executors will allocate a number of the cores defined in sql.spark.mv.cores, and will consume part of the memory defined in sql.spark.mv.memory. Note that the cores and memory assigned per executor will be equal for each executor, hence the number of executors should be a divisor for each of the following configurations (sql.spark.mv.cores and sql.spark.mv.memory). For example, when you configure an application with cores=4, memory=8, executors=2, that result is that the Spark will spawn 2 executors where each executor consumes 2 cores and 4GB from the cluster).

Edit Default Materialized View Settings

Here is how you can modify these settings and their default values:

  • In the navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select Cluster Configurations.
  • In the panel tabs, select Server Configurations.
  • In the left pane, select Spark Integration.
  • Set the value for a given Materialized view application setting:
    • Materialized view application cores
    • Materialized view application memory
    • Materialized view application executors
  • Select Save.

Materialized View-specific Settings

You can configure a Mmterialized view custom settings, which will override the default existing cluster settings. You can do this adding custom properties in the Materialized View Data Source window. For example, you can add the following properties for a specific MV:

  • spark.executor.memory
  • spark.executor.cores
  • spark.cores.max

Spark Standalone Configuration

You can define additional job settings other than those defined in the Spark Integrations settings in the CMC by editing the following configuration file: <incorta_home>/IncortaNode/spark/conf/spark-defaults.conf

Other settings related to networking and Spark Worker resources can be defined in the following configuration file: <incorta_home>/IncortaNode/spark/conf/spark-env.sh

Precedence of Materialized View Settings

The precedence of Materialized View settings that will take effect when running are in the following order:

  • Properties added/defined in the Materialized View itself
  • Materialized View settings defined in the CMC
  • Settings defined in spark-defaults.conf configuration file

Since the Materialized View settings in the CMC are mandatory, they will always precede the same settings in the spark-defaults.conf configuration file when defined. Additional settings outside of the CMC will run with Spark’s default configuration or as defined in the configuration file.