Connectors → JSON

About JSON

JavaScript Object Notation JSON is a lightweight data-interchange format that is human and machine readable. JSON is schema-less, unordered, and is often hierarchical in nature.

Here are the basic rules for JSON syntax:

  • curly braces { } surround an object
  • an object consists of one or more "key":value pairs
  • a key is a string in quotes such as "key"
  • a colon separates the key from the value
  • a value is a string, number, object, array, boolean or null
  • a comma separates "key":value pairs
  • square brackets [ ] surround an array with a comma , separating values within the array
  • an array value array must be of type string, number, object, array, boolean or null

About the JSON Connector

The JSON Connector utilizes the CDATA JDBC driver for JSON. In order to query a JSON data source using a SQL SELECT statement, the driver supports the configuration of various JDBC connection properties, including OAuth authentication. The JSON connector also supports a JSON Path query expressions to separate an array of values as rows.

As a result, the connector allows you to connect to local and remote JSON data sources using an Universal Resource Indicator (URI) which includes both service and storage providers such as AWS S3, Azure, Dropbox, Box, Google Cloud Storage, Google Drive, OneDrive, Sharepoint, FTP, and more.

In addition, the JSON connector allows for the configuration of a JSON data model: Document, FlattenedDocuments, or Relational.

The JSON connector uses the cdata.jdbc.json.jar driver to connect to a JSON data source and get data.

Note

The JSON connector requires a JAR file that Incorta tests and verifies. The supported JAR download is only available from Incorta Support and must be purchased from Incorta. The JSON connector exposes various properties of the CDATA JDBC driver for JSON for an external data source. For information about the CDATA JDBC driver for JSON, please check CData JDBC Driver for JSON.

The JSON connector supports the following Incorta specific functionality:

FeatureSupported
Chunking
Data Agent
Encryption at Ingest
Incremental Load
Multi-Source
OAuth
Performance Optimized
Remote
Single-Source
Spark Extraction
Webhook Callbacks

Deploy the JAR file

The JSON Connector requires the following JAR file:

  • cdata.jdbc.json.jar

The JSON connector requires the deployment of a JAR file to the Incorta Node hosts of the Analytics Service and the Loader Service. A systems administrator with root access to the host can deploy the JAR file. A CMC Administrator can restart the Incorta cluster.

Here are the steps to copy the JAR file to standalone Incorta cluster:

  • Secure copy the cdata.jdbc.json.jar file to the host. Here is an example using scp:

    INCORTA_NODE_HOST=100.101.102.103
    cd ~/Downloads
    scp -i ~/.ssh/host_pemkey.pem cdata.jdbc.json.jar incorta@${INCORTA_NODE_HOST}:/tmp/
  • Secure shell into the host

    ssh -i ~/.ssh/host_pemkey.pem incorta@${INCORTA_NODE_HOST}
  • Copy the cdata.jdbc.json.jar to the IncortaNode/runtime/lib/ directory in bash shell

    sudo incorta
    INCORTA_INSTALLATION_PATH=/home/incorta/IncortaAnalytics/
    cp /tmp/cdata.jdbc.json.jar $INCORTA_INSTALLATION_PATH/IncortaNode/runtime/lib/cdata.jdbc.json.jar

Here are the steps to restart the standalone Incorta cluster:

  • Sign in to the Cluster Management Console (CMC) as the CMC Administrator.
  • In the Navigation bar, select Clusters.
  • Select the cluster name in the list.
  • In Details, select Restart.

Steps to connect JSON data source and Incorta

To connect a JSON data source and Incorta, here are the high-level steps, tools, and procedures:

Create an external data source

A Tenant Administrator (Super User), a user that belongs to a group with the SuperRole role, or a user that belongs to a group with the Schema Manager role can create an external data source for a given tenant.

Here are the steps to create an external data source with the JSON connector:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Data.
  • In the Action bar, select + NewAdd Data Source.
  • In the Choose a Data Source dialog, in File System, select JSON.
  • In the New Data Source dialog, specify the applicable connector properties.
  • To test, select Test Connection.
  • Select Ok to save your changes.

The JSON connector properties

Here are the properties for the JSON connector:

PropertyControlDescription
Data Source Nametext boxRequired. Enter the name of the data source.
URItext boxRequired. Enter the Uniform Resource Identifier (URI) for the JSON data source location.
JSON Pathtext boxOptional. Enter the JSONPath of an array element that defines the separation of rows.
JSON Formatdrop down listRequired. Select the format of the JSON data source from the following options:
  ●  JSON
  ●  JSONRows
  ●  Line-Delimited JSON
Use Connection PoolingtoggleOptional. Enable to set the related properties for connection pooling.
Data Modeldrop down listRequired. Select the data model to use when parsing documents and generating the database metadata. The available options to select from are:
  ●  Document
  ●  FlattenedDocument
  ●  Relational
For more information, please refer to Modeling JSON Data
Generate Schema Filesdrop down listRequired. Select when schemas should be generated and saved. Available options to select from are:
  ●  Never
  ●  OnUse
  ●  OnStart
  ●  OnCreate
Schema Locationtext boxIf you select to generate schema files, enter the path to the directory that you want to use to save the schema files. The default is ./home/incorta/schema.
Service Providerdrop down listRequired. Select the local or remote source or service for the JSON data source. The Service Provider selection dynamically affects the available properties for configuration.
Show Advanced OptionstoggleOptional. Enable to configure the advanced properties. For more information, please refer to Establishing a Connection.

Create a schema with the Schema Wizard

Here are the steps to create a JSON schema with the Schema Wizard:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + NewSchema Wizard.
  • In (1) Choose a Source, specify the following:
    • For Enter a name, enter the schema name.
    • For Select a Datasource, select the JSON data source.
    • Optionally, enter a description.
  • In the Schema Wizard footer, select Next.
  • In (2) Manage Tables, in the Data Panel, first select the name of the Data Source, and then check the Select All checkbox.
  • In the Schema Wizard footer, select Next.
  • In (3) Finalize, in the Schema Wizard footer, select Create Schema.

Create a schema with the Schema Designer

Here are the steps to create a JSON schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + NewCreate Schema.
  • In the Create Schema dialog, in Name, specify the schema name, and then select Save.
  • In Start adding tables to your schema, select JSON.
  • In the Data Source dialog, specify the JSON table data source properties.
  • Select Add.
  • In the Table Editor, in the Table Summary section, enter the table name.
  • To save your changes, in the Action bar, select Done.

JSON table data source properties

For a schema table in Incorta, you can define the following JSON-specific data source properties as follows:

PropertyControlDescription
Typedrop down listDefault is JSON
Data Sourcedrop down listSelect the JSON external data source
IncrementaltoggleEnable the incremental load configuration for the schema table. See Types of Incremental Load.
Incremental Extract Usingdrop down listEnable Incremental to configure this property. Select between Last Successful Extract Time and Maximum Value of a Column. See Types of Incremental Load.
Incremental Columndrop down listEnable Incremental and select Maximum Value of a Column to configure this property. Select the column to be used for Maximum Value of a Column. The Loader will track and use the greatest value or most recent timestamp for each load operation.
Querytext boxEnter the SQL Select query to retrieve data from the JSON data set
Update Querytext boxEnable Incremental to configure this property. Enter the SQL Select query to use during an incremental load. The query and update query should be of the same structure, that is, the same selected columns.
Incremental Field Typedrop down listEnable Incremental to configure this property. Select the format of the incremental field:
  ●   Timestamp
  ●   Unix Epoch (seconds)
  ●   Unix Epoch (milliseconds)
Fetch Sizetext boxFor performance improvement, define the number of records that will be retrieved from the database in each batch until all records are retrieved. The default is 5000.
Chunking Methoddrop down listChunking methods allow for parallel extraction of large tables. The default is No Chunking. There are two chunking methods:
  ●   By Size of Chunking (Single Table)
  ●   By Date/Timestamp
Chunk Sizetext boxSelect By Size of Chunking for the Chunking Method to set this property. Enter the number of records to extract in each chunk in relation to the Fetch Size. The default is 3 times the Fetch Size.
Order Columndrop down listSelect By Size of Chunking for the Chunking Method to set this property. Select a column in the source table you want to order by before chunking. It's typically an ID column and it must be numeric.
Upper Bound for Order Columntext boxOptional. Enter the maximum value for the order column.
Lower Bound for Order Columntext boxOptional. Enter the minimum value for the order column.
Order Column [Date/Timestamp]drop down listSelect By Date/Timestamp for the Chunking Method to set this property. Select a column in the source table you want to order by before chunking. It should be a Date/Timestamp column.
Chunk Perioddrop down listSelect the chunk period that will be used in dividing chunks:
  ●   Daily
  ●   Weekly (default)
  ●  Monthly
  ●  Yearly
  ●  Custom
Number of daystext boxSelect Custom for the Chunk Period to set this property. Enter the chunking period in days
CallbacktoggleEnable post extraction callback, that is, enabling callback on the data source data set(s) by invoking a certain callback URL with parameters containing details about the load job
Callback URLtext boxEnable Callback to configure this property. Specify the callback URL.

View the schema diagram with the Schema Diagram Viewer

Here are the steps to view the schema diagram using the Schema Diagram Viewer:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the JSON schema.
  • In the Schema Designer, in the Action bar, select Diagram.

Load the schema

Here are the steps to perform a Full Load of the JSON schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the JSON schema.
  • In the Schema Designer, in the Action bar, select LoadLoad NowFull.
  • To review the load status, in Last Load Status, select the date.

Explore the schema

With the full load of the JSON schema complete, you can use the Analyzer to explore the schema, create your first insight, and save the insight to a new dashboard.

To open the Analyzer from the schema, follow these steps:

  • In the Navigation bar, select Schema.
  • In the Schema Manager, in the List view, select the JSON schema.
  • In the Schema Designer, in the Action bar, select Explore Data.

For more information about how to use the Analyzer to create insights, see Analyzer and Visualizations.

Additional Considerations

Types of Incremental Load

You can enable Incremental Load for a JSON table data source. There are two types of incremental extracts:

Last Successful Extract Time

Fetch updates since the last time the tables were loaded. This is determined by the difference between the current time and the database timestamp.

Maximum Value of a Column

The column-based strategy depends on an extra column called "Incremental Column" in each table. The JSON connector supports both timestamp and numeric columns. A timestamp column is of the type date or timestamp. A numeric column is of the type int or long.

Note

Changing the incremental load strategy requires a full load to ensure data integrity.

Incremental Load Example

In this example, the invoices table must contain a column of the type Date or Timestamp in order to load the table incrementally with a last successful extract time strategy. In this case, the name of the date column is ModifiedDate and the format of the column is Timestamp.

Here are the data source property values for this example:

Incremental is enabled

Query contains SELECT * FROM invoices

Update Query contains SELECT * FROM invoices WHERE ModifiedDate > ?

Note

? is a variable in the update query that contains the last schema refresh date.

Incremental Field Type = Timestamp

Note

If running an update query for an incremental load, you are able to use the ? reference character. The ? character will be replaced with the last incremental reference to construct a valid query to the database. The ? reference character is not valid in a standard query.

Valid Query Types

When creating a query for the JSON connector, only SELECT statements are valid.