Connectors → Google BigQuery
About GoogleBigQuery
Google BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service that supports querying using ANSI SQL. It also has built-in machine learning capabilities.
About the Google BigQuery Connector
The Google BigQuery Connector enables Incorta to access data stored in Google BigQuery data sets. You can access the data you want with a standard SQL query. The Google BigQuery Connector uses the com.simba.googlebigquery.jdbc42.Driver
driver.
The Google BigQuery connector supports the following Incorta specific functionality:
Feature | Supported |
---|---|
Chunking | ✔ |
Data Agent | |
Encryption at Ingest | |
Incremental Load | ✔ |
Multi-Source | ✔ |
OAuth | |
Performance Optimized | ✔ |
Remote | |
Single-Source | ✔ |
Spark Extraction | |
Webhook Callbacks | ✔ |
Security configurations for the Google BigQuery connector
Prerequisites for use of the BigQuery connector are the following, which are typically handled by a Security Administrator or System Administrator who manages your organization’s G Suite accounts:
- Create a G Suite Account
- Within the G Suite Account, create a Google Cloud Project
- Within the Google Cloud Project,
- Enable the BigQuery API
- Create the BigQuery dataset(s)
The Security Administrator or System Administrator must then create a service account key. A service account key can be created with the following steps:
- In the Cloud Console, go to the Create service account key page.
- From the Service account list, select New service account.
- In the Service account name field, enter a name.
- From the Role list, select a role.
- Select Create. A JSON file that contains your key downloads to your computer. The Incorta Administrator must copy this JSON to the Incorta server. The full path to this file will be entered in the connector properties.
The Role field affects which resources your service account can access in your project. You can revoke these roles or grant additional roles later. In production environments, do not grant the Owner, Editor, or Viewer roles. For more information, see Manage access to projects, folders, and organizations.
Steps to connect Google BigQuery and Incorta
To connect Google BigQuery and Incorta, here are the high level steps, tools, and procedures:
- Create an external data source
- Create a schema with the Schema Wizard
- or, Create a schema with the Schema Designer
- Load the schema
- Explore the schema
Create an external data source
Here are the steps to create a external data source with the Google BigQuery connector:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Data.
- In the Action bar, select + New → Add Data Source.
- In the Choose a Data Source dialog, in Query service, select BigQuery.
- In the New Data Source dialog, specify the applicable connector properties.
- To test, select Test Connection.
- Select Ok to save your changes.
Google BigQuery connector properties
Here are the properties for the Google BigQuery connector:
Property | Control | Description |
---|---|---|
Data Source Name | text box | Enter the name of the data source |
Project ID | text box | Enter the Google Cloud Project ID |
Private Key Path | text box | Enter the full path to the Google BigQuery private key file on the Incorta server. For example, home/incorta/IncortaAnalytics/ IncortaNode/runtime/lib/ bigqueryfullkey.json |
Connection Pool | text box | Enter the connection pool. The default is 30. |
Connection Properties | text box | Optionally enter connector properties for a custom connection to Google BigQuery in the format: propertyName=propertyValue , where each connector property is on a new line.The available connector properties are specified by com.simba.googlebigquery.jdbc42.Driver . Refer to the Google BigQuery Driver Configuration Options for further information. |
Create a schema with the Schema Wizard
Here are the steps to create an Google BigQuery schema with the Schema Wizard:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Schema Wizard
- In (1) Choose a Source, specify the following:
- For Enter a name, enter the schema name.
- For Select a Datasource, select the Google BigQuery external data source.
- Optionally create a description.
- In the Schema Wizard footer, select Next.
- In (2) Manage Tables, in the Data Panel, navigate the directory tree as necessary to select the Google BigQuery files. You can either check the Select All checkbox or select individual sheets.
- In the Schema Wizard footer, select Next.
- In (3) Finalize, in the Schema Wizard footer, select Create Schema.
Create a schema with the Schema Designer
Here are the steps to create an Google BigQuery schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Create Schema.
- In Name, specify the schema name, and select Save.
- In Start adding tables to your schema, select SQL Database.
- In the Data Source dialog, specify the Google BigQuery table data source properties.
- Select Add.
- In the Table Editor, in the Table Summary section, enter the table name.
- To save your changes, select Done in the Action bar.
Google BigQuery table data source properties
For a schema table in Incorta, you can define the following Google BigQuery specific data source properties as follows:
Property | Control | Description |
---|---|---|
Type | drop down list | Default is SQL Database |
Data Source | drop down list | Select the Google BigQuery external data source |
Incremental | toggle | Enable the incremental load configuration for the schema table |
Query | text box | Enter the SQL query to retrieve data from the Google BigQuery dataset |
Update Query | text box | Enable Incremental to configure this property. Enter the SQL query to retrieve data updates from the Google BigQuery dataset. |
Timestamp Field Type | drop down list | Enable Incremental to configure this property. Select the format of the table date column: ● Timestamp ● Unix Epoch (seconds) ● Unix Epoch (milliseconds) |
Fetch Size | text box | Used for performance improvement, fetch size defines the number of records that will be retrieved from the database in each batch until all records are retrieved. The default is 5000. |
Chunking Method | drop down list | Chunking methods allow for parallel extraction of large tables. The default is No Chunking. There are two chunking methods: ● By Size of Chunking (Single Table) ● By Date/Timestamp |
Chunk Size | text box | Select By Size of Chunking for the Chunking Method to set this property. Enter the number of records to extract in each chunk in relation to the Fetch Size. The default is 3 times the Fetch Size. |
Order Column | drop down list | Select By Size of Chunking for the Chunking Method to set this property. Select a column in the source table you want to order by before chunking. It's typically an ID column and it must be numeric. |
Upper Bound for Order Column | text box | Optional. Enter the maximum value for the order column. |
Lower Bound for Order Column | text box | Optional. Enter the minimum value for the order column. |
Order Column [Date/Timestamp] | drop down list | Select By Date/Timestamp for the Chunking Method to set this property. Select a column in the source table you want to order by before chunking. It should be a Date/Timestamp column. |
Chunk Period | drop down list | Select the chunk period that will be used in dividing chunks: ● Daily ● Weekly (default) ● Monthly ● Yearly ● Custom |
Number of days | text box | Select Custom for the Chunk Period to set this property. Enter the chunking period in days |
Callback | toggle | Enable this option to call back on the source data set |
Callback URL | text box | Enable Callback to configure this property. Specify the URL. |
View the schema diagram with the Schema Diagram Viewer
Here are the steps to view the schema diagram using the Schema Diagram Viewer:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the list of schemas, select the Google BigQuery schema.
- In the Schema Designer, in the Action bar, select Diagram.
Load the schema
Here are the steps to perform a Full Load of the Google BigQuery schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the list of schemas, select the Google BigQuery schema.
- In the Schema Designer, in the Action bar, select Load → Load Now → Full.
- To review the load status, in Last Load Status, select the date.
Explore the schema
With the full load of the Google BigQuery schema complete, you can use the Analyzer to explore the schema, create your first insight, and save the insight to a new dashboard.
To open the Analyzer from the schema, follow these steps:
- In the Navigation bar, select Schema.
- In the Schema Manager, in the List view, select the Google BigQuery schema.
- In the Schema Designer, in the Action bar, select Explore Data.