Connectors → Google Drive

About Google Drive

Google Drive is Google’s cloud-storage service that allows you to store, share, and collaborate on files and folders from any mobile device, tablet, or computer. Google Drive comes with a Google Account or G Suite Account.

About the Google Drive connector

With the Google Drive connector, you can create a data source for a Google Drive file or folder. The Google Drive connector supports the following file extensions:

  • .csv
  • .tsv
  • .tab
  • .txt
  • .xlsx

You can access all folders and files that you own and any folders or files that someone shares with you.

If you want to create a data source for Google Sheets, you must use the dedicated Google Sheets connector. The Google Sheets connector also supports both the selection of a specific folder or file in Google Drive. For that reason, you must enable and configure the Google Drive connector to support the Google Sheets connector.

Note

In this release, you are not able to access folders or files in a Shared Drive.

The Google Drive connector supports the following Incorta specific functionality:


FeatureSupported
Chunking
Data Agent
Encryption at Ingest
Incremental Load
Multi-Source
OAuth
Performance Optimized
Remote
Single-Source
Spark Extraction
Webhook Callbacks
Important

Google Drive supports storing files with the same name and multiple versions of the same file. The Google Drive connector supports accessing a file in My Drive that has a unique file name and current file version.

The Google Drive connector requires the following:

  • Security configurations
  • Default and Tenant Configurations

Some configurations may differ if you are deploying the Google Drive connector in an Incorta Cloud instance. For example, an Incorta Cluster in Cloud natively supports HTTPS.

Security configurations for the Google Drive connector

Security and system administrators typically address the security requirements for the Google Drive connector. The connector uses the Google API, and as such, requires the following:

  • HTTPS for the Incorta Cluster
  • G Suite account
  • Google API Project with both the Google Drive API and the Google Sheets API enabled

HTTPS for the Incorta Cluster

In order to use the Google Drive connector or Google Sheets connector, you must configure your Incorta Cluster to use HTTPS (Hypertext Transfer Protocol Secure). Typically, a System Administrator for the operating system with root access configures an Incorta Cluster for HTTPS.

Important

The Google APIs do not accept self-signed security certificates. You must use a valid certificate for a known public domain.

To learn more about how to configure HTTPS with TLS/SSL for your Incorta Cluster using Let’s Encrypt, Certbot and OpenSSL, please review Security → HTTPS for Apache Tomcat with OpenSSL.

Client Credentials

A Security Administrator or System Administrator who manages your organization’s G Suite accounts as well as your Incorta Cluster creates the required Google API project. The G Suite Account must sign in to the Google Developers Console, create a project, create an OAuth consent screen, and then create the client credentials.

To learn more about how to create client credentials for a Google API project, please review Security → Client credentials for a Google Drive API project.

Default and Tenant Configurations for the Google Drive Connector

A Cluster Management Console (CMC) administrator for your Incorta Cluster must configure each tenant to use the client credentials.

Important

After configuration, you must restart the Analytics Service, Loader Service, and any add-ons such as the Notebook Service.

Specify the client credentials for the Default Tenant Configuration

Here are the steps to specify the required properties for the Default Tenant Configuration:

  • Sign in to the CMC.
  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select Cluster Configurations.
  • In the panel tabs, select Default Tenant Configurations.
  • In the left pane, select Integration.
  • In the right pane, specify
  • Your Client ID in Google Drive Client ID.
  • Your Client Secret in Google Drive Client Secret.
  • Select Save.

Specify the client credentials for a Tenant Configuration

Here are the steps to specify the required properties for a specific tenant:

  • Sign in to the CMC.
  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select Tenant.
  • For the given tenant, select Configure.
  • In the left pane, select Integration.
  • In the right pane, specify:
  • Your Client ID in Google Drive Client ID.
  • Your Client Secret in Google Drive Client Secret.
  • Select Save.

Restart the Incorta Services

Here are the steps to restart the various services in an Incorta Cluster from the Cluster Management Console (CMC).

  • As the CMC Administrator, sign in to the CMC.
  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the Details canvas tabs, in the footer bar, select Restart.

Steps to Connect Google Drive and Incorta

To connect your Google Drive and Incorta, here are the high level steps, tools, and procedures:

Create an external data source

Here are the steps to create an external data source with the Google Drive connector:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Data.
  • In the Action bar, select + NewAdd Data Source.
  • In the Choose a Data Source dialog, in File System, select Google Drive.
  • In the New Data Source dialog, specify the applicable connector properties.
  • To test, select Test Connection.
  • Select Ok to save your changes.
Note

If you select the lowest folder in the tree, you will see No Data in the Select Directory from dialog. You will have access to the files in this folder upon schema creation. However, you will not be able to select the parent folder.

Google Drive connector properties

Here are the properties for the Google Drive connector:

PropertyControlDescription
Data Source Nametext boxEnter the name of the data source
AuthorizebuttonSelect this button to authenticate your Google account and grant Incorta read access to your Google Drive. Choose an account to use to access your Google Drive and select the Allow button. The New Data Source dialog will reappear, and the Authorize button will change to Authorized with the name of the Google account to the right.
BrowsebuttonSelect a folder from the directories shown that contains the folder or file you would like to connect to. If you do not choose a folder, you will have access to all folders and files found in My Drive and Shared with me. It is not possible to select a parent folder for a table data source.

Create a schema with the Schema Wizard

Here are the steps to create a Google Drive schema with the Schema Wizard:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + New → Schema Wizard
  • In (1) Choose a Source, specify the following:
    • For Enter a name, enter the schema name.
    • For Select a Datasource, select the Google Drive external data source.
    • Optionally create a description.
  • In the Schema Wizard footer, select Next.
  • In (2) Manage Tables, in the Data panel, navigate the directory tree as necessary to select your file.
  • In the Schema Wizard footer, select Next.
  • In (3) Finalize, in the Schema Wizard footer, select Create Schema.

Create a schema with the Schema Designer

Here are the steps to create a Google Drive schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + New → Create Schema.
  • In Name, specify the schema name, and select Save.
  • In Start adding tables to your schema, select File System.
  • In the Data Source dialog, specify the various properties table data source properties.
  • Select Add.
  • In the Table Editor, in the Table Summary section, enter the table name.
  • To save your changes, select Done in the Action bar.

Google Drive table data source properties

You can specify a single file or folder in the Data Source dialog. Both the Schema Designer and Table Editor represent a single file and folder data source as a single-source table. In order to select a folder in your My Drive, you must enable Union Files.

Note

This release has limited support for Union Files for Excel (.xlsx) Workbook files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties.

Common properties for a file and folder

Here are some of the common properties for both the selection a file and a folder:

PropertyControlDescription
Typedrop down listDefault is File System
Data Sourcedrop down listSelect the Google Drive external data source
File Typedrop down listSelect the Text (.csv, .tsv, .tab, .txt) or Excel (.xlsx)
Has Header?toggleSelect if first row contains column header values
CallbacktoggleEnables the Callback URL field
Callback URLtext boxThis property appears when the Callback toggle is enabled. Specify the URL.

Common file properties

Here are some of the common properties specifically related to selecting a file of either type Text (.csv, .tsv, .tab, .txt) or Excel (.xlsx):

PropertyControlDescription
IncrementaltoggleEnable to support incremental loading. For a single file, you must specify both a File and Update file.
FilebuttonSelect a file opens the Add File from dialog. The dialog shows the files from your Google Drive data source. Select a single file and select Add.
Update FilebuttonWith Incremental enabled, Update File is available. Select a file opens the Add File from dialog. The dialog shows the files from your Google Drive data source. Select a single file and select Add.
Note

With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.

Properties for an Excel Workbook file

Here are the specific properties for an Excel Workbook (.xlsx) file:

PropertyControlDescription
Worksheetdrop down listSelect a given worksheet for the Excel Workbook

Properties for a Text file

Here are the properties specific to a Text (.csv, .tsv, .tab, .txt) file:

PropertyControlDescription
Date Formatdrop down listSelect a specific format for date columns. Date formats are Java date format conventions. With Automatic, Incorta will determine the format by sampling the first few rows.
Timestamp Formatdrop down listSelect a specific format for timestamp columns. Timestamp formats are Java data and time format conventions. With Automatic, Incorta will determine the format by sampling the first few rows.
Character Setdrop down listSelect a supported character set.
Separatordrop down listAvailable when the selected File Type is Text. Specify a separator for columns in the row values. Comma and Tab are standard delimiters. Other requires that you specify a value such as :.
Othertext boxAvailable when the Separator is Other. Enter one or more characters to specify the column separator or delimiter between values in a row.
Enable ChunkingtoggleEnable for large file sizes
Chunk Size (MB)text boxEnter a value in megabytes (MB) to specify the chunk size

Common folder properties

Folder properties are available when you enable Union Files. It is not possible to select a parent folder.

Here are the properties specifically related to selecting a folder:

PropertyControlDescription
IncrementaltoggleEnable to support incremental loading
Union FilestoggleEnable to select all files within a given folder. When enabled, you will only be able to select a folder from your Google Drive data source.
DirectorybuttonSelect a folder from your Google Drive data source. It is not possible to select a parent folder.
Includetext boxEnter a keyword with a wildcard * symbol to include specific named files within the folder
Excludetext boxEnter a keyword with a wildcard * symbol to exclude specific named files within the folder
Include Sub-Directories FilestoggleEnable to include files from sub-folders
Add Filename as a columntoggleEnable to add the filename of the file as a column. You will then need to specify a column name.
Filename columntext boxEnter a column name for the filename such as source_file_name
Note

With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.

Folder properties for Excel Workbook files
Important

This release has limited support for Union Files for Excel Workbook (.xlsx) files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties. For this reason, each Excel Workbook file in the selected folder must have a common Worksheet tab name. You must select this common Worksheet name in the drop down list.

Here are the properties specifically related to selecting a folder with a file type as Excel Workbook (.xlsx) files:

PropertyControlDescription
Worksheetdrop down listSelect a tab for a worksheet

View the schema diagram with the Schema Diagram Viewer

Here are the steps to view the schema diagram using the Schema Diagram Viewer:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the Google Drive schema.
  • In the Schema Designer, in the Action bar, select Diagram.

Load the schema

Here are the steps to perform a Full Load of the Google Drive schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the Google Drive schema.
  • In the Schema Designer, in the Action bar, select Load → Load Now → Full.
  • To review the load status, in Last Load Status, select the date.

Explore the schema

With the full load of the Google Drive schema complete, you can use the Analyzer to explore the schema, create your first insight, and save the insight to a new dashboard.

To open the Analyzer from the schema, follow these steps:

  • In the Navigation bar, select Schema.
  • In the Schema Manager, in the List view, select the Google Drive schema.
  • In the Schema Designer, in the Action bar, select Explore Data.