Concepts → Physical Schema

About a Physical Schema

A physical schema defines one or more entity objects and the relationships between entity objects. It also defines how the Loader Service extracts, loads, and enriches data for the physical schema, and how the Analytics Service loads data into memory for the physical schema. The Incorta Metadata database captures this distinction. In the Incorta Metadata database, in the SCHEMA table, the SCHEMADATA field captures the XML metadata for the physical schema definition that includes all entity objects and join relationships. The LOADERDATA field captures the XML metadata for how the Loader Service loads the entity objects into Shared Storage and how the Analytics Service loads data from Shared Storage into memory.

A physical schema serves as a base object for other dependent objects such as a join to another physical schema entity object, a business schema, an insight on a dashboard tab, or an internal session variable.

A physical schema owner or another privileged user can grant access rights to a physical schema to one or more users or groups.

About entity objects in a physical schema

There are five types of entity objects in a physical schema:

Note

In this release, an Incorta SQL table is an Incorta Labs feature.

Depending on individual configurations and related data sources, an entity object in a physical schema may require significant resources for compute (CPU), memory (on-heap and off-heap), network, and shared storage.

As a result, schema developers and related administrators (tenant, Cluster Management Console (CMC), Apache Spark, and Linux host administrators) must optimize the design and load configuration of each physical schema design in a given tenant. The goal is to ensure that the Incorta cluster has adequate resources to efficiently and reliably support all workloads.

Warning: Avoid releasing an untested physical schema in a production environment

Because a physical schema represents a workload that will consume compute, memory, network, and shared storage, each physical schema requires testing and optimization before migration to a production environment.

About joins in a physical schema

A join defines how one or more entity objects relate. A join can be a self-join, a join between two entity objects in the same physical schema, or a join between two entity objects in different physical schemas.

To learn more about a join and how to create a join, start with Concepts → Join.

About a physical schema diagram

Using the Schema Diagram Viewer, a user can visually review the entity objects in the physical schema and the relationships between entity objects. To learn more, see Tools → Schema Diagram Viewer

Properties of a physical schema

Some properties for a physical schema are configurable. Other properties are read-only, but are helpful to monitor and understand. The three categories of properties for a physical schema are:

  • Object properties
  • Load job properties
  • In-memory properties

Object properties

A physical schema has the following object properties:

Property Description
Name The name of the physical schema. This is immutable.
Owner The login name (username) of the user that initially creates the physical schema
Description The description for the physical schema
Performance optimization Enables each Analytics Service in an Incorta cluster to load the physical schema into memory
Load order Configure the load order of physical schema entity objects (except Aliases) by one or more groups and and materialized views by one or more groups

Name property

A physical schema name must adhere to the following name validation rules:

  • Must be unique to the tenant
  • Must be between 1 and 250 characters in length
  • Must begin with an alpha character, lower or upper case
  • After the first alpha character, can contain zero or more alphanumeric characters in lower, upper, or mixed case
  • After the first alpha character, can contain zero or more underscore (_) or Dollar sign ($) characters
  • Besides underscore (_) and Dollar sign ($), cannot contain special characters, symbols, or spaces

Load job properties

Representing a workload, a physical schema also has aggregated load job properties:

Property Description
Last Load Status Describes the status of the last load as a date and time, Finished with errors or, during a load job, shows the current status such as In Queue, Extraction, Enrichment, Load, or Post-load
Loading Time Displays duration for the load
Tables The number of entity objects in the physical schema
Joins The number of joins between object in the physical schema
Rows The number of rows available to load into the memory of each Analytics Service in an Incorta cluster. When applicable, a Load Filter typically influences this value.
Data Size (Data on Disk) Size in KBs, MBs, or GBs for all files related to the physical schema load in shared storage, including Direct Data Mapping (DDM) files and Parquet files.

In-Memory properties

In the Cluster Management Console, a CMC Administrator can view the memory usage details for an Analytics Service or Loader Service for one or more Incorta Nodes in a given Incorta cluster. At a high-level, the CMC depicts memory usage in terms of both On-Heap and Off-Heap memory. In this release, the CMC also details the Off-Heap segmentation between Active memory and Pooled memory.

Memory usage details

The Memory Usage details dialog show Off-Heap memory.

Property Description
Name The object name such as a tenant, a table, data column, calculated column, or join column
Type The data type for a column or join column such as Boolean, Date, Double, Integer, Long, Null, Timestamp, or String)
Is In Memory True, False, and Partial are the possible values. True means all data and all join columns from all applicable entity objects are in memory. False means that none of the data or the join columns from all applicable entity objects are in memory. Partial means that some of the data (indexes or data columns) or of the joins are in-memory.
Loaded Memory The total off-heap memory.

It is also possible to view a more granular view of memory diagnostic details for Off-Heap memory using the memory.jsp page for a given Analytics or Load Service on a given Incorta Node. The memory diagnostics page details the following:

Property Description
Name Describes the name of the object such as a tenant, a table, data column, calculate column, join column
Type The data type for a column or join column such as Boolean, Date, Double, Integer, Long, Null, Timestamp, or String)
Data The combined size of the data as an index to the dictionary, the dictionary, and the hash table.
Dictionary The size of the actual data (key columns) as an array of unique values
Hash Table The size of hash table for the hash of key columns.
String Pool The size of of all key columns that are of type String stored as as a String Pool. A String pool is an array of characters that contains the number of characters in string followed by the string of characters.
Is In Memory True, False, and Partial are the possible values. True means all data and all join columns from all applicable entity objects are in memory. False means that none of the data or the join columns from all applicable entity objects are in memory. Partial means that some of the data (indexes or data columns) or some of the joins are in-memory.
Loaded Memory The total off-heap memory.

Access rights for a physical schema

A SuperUser (tenant administrator) or a user that belongs to a group assigned the SuperRole or the Schema Manager role can create, view, and edit the physical schemas in a given tenant.

Note: SuperUser and SuperRole access rights

Only a SuperUser (tenant administrator) or a user that belongs to a group assigned the SuperRole can view, edit, and share all physical schemas, regardless of ownership.

The following are the available access rights for a physical schema:

Property Icon Description
Can View eye The user can View the physical schema and the loaded data for specific entity objects, for example, in the Analyzer
Can Share node tree The user is able to Share the physical with other users and groups
Can Edit pen and paper The user is able to Edit the physical schema as well as execute a Load job for the physical schema
Important

The Edit access right for a physical schema also allows a user to execute a Load job for the physical schema or Load table job for given entity object.

Additional considerations for a physical schema

There are numerous considerations for designing and managing a physical schema ranging from referential integrity, data consistency, denormalization, formula column calculations, data duplication, pre-aggregation, enrichment, incremental loads, load job schedules, load durations, runtime security, migration, versioning, cross-schema joins, object dependency, and dashboard performance.