BigQuery

This guide covers how to configure Google BigQuery as a warehouse in SignalSmith, including project setup, service account creation, and required IAM permissions.

Prerequisites

A Google Cloud project with BigQuery enabled
A service account with read access to the target dataset(s)
The BigQuery API enabled in your GCP project

Connection Configuration

Required Fields

Field	Description	Example
Project ID	The Google Cloud project containing your BigQuery datasets	`my-company-analytics`
Dataset	The default BigQuery dataset to query	`customer_data`
Service Account JSON	The full JSON key file for the service account	(see below)

Optional Fields

Field	Description	Default
Location	The BigQuery dataset location	Auto-detected

Project ID

The project ID is the unique identifier for your Google Cloud project. You can find it in the GCP Console under IAM & Admin > Settings, or in the project selector dropdown.

Note: Use the project ID (e.g., my-company-analytics), not the project name (e.g., “My Company Analytics”) or project number (e.g., 123456789012).

Dataset

The default dataset that SignalSmith will query. Models can reference tables in other datasets using fully-qualified names (e.g., project.dataset.table), but the default dataset is used when table names are unqualified.

BigQuery dataset names are case-sensitive. SignalSmith normalizes identifiers to lowercase to match BigQuery conventions when using backtick-quoted identifiers.

Service Account Setup

SignalSmith authenticates with BigQuery using a GCP service account. Follow these steps to create one.

Step 1: Create the Service Account

# Using gcloud CLI
gcloud iam service-accounts create signalsmith-reader \
  --project=my-company-analytics \
  --display-name="SignalSmith Reader" \
  --description="Read-only access for SignalSmith CDP"

Or in the GCP Console:

Go to IAM & Admin > Service Accounts
Click Create Service Account
Name it signalsmith-reader
Click Create and Continue

Step 2: Grant Required Roles

The service account needs the following IAM roles:

Role	Purpose
`roles/bigquery.dataViewer`	Read access to tables and views in the dataset
`roles/bigquery.jobUser`	Permission to run BigQuery jobs (queries)

# Grant BigQuery Data Viewer on the dataset
gcloud projects add-iam-policy-binding my-company-analytics \
  --member="serviceAccount:signalsmith-reader@my-company-analytics.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"
 
# Grant BigQuery Job User (needed to run queries)
gcloud projects add-iam-policy-binding my-company-analytics \
  --member="serviceAccount:signalsmith-reader@my-company-analytics.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

For more granular control, you can grant roles/bigquery.dataViewer at the dataset level instead of the project level:

# Dataset-level access (more restrictive)
bq update \
  --source=dataset_policy.json \
  my-company-analytics:customer_data

Where dataset_policy.json includes:

{
  "access": [
    {
      "role": "READER",
      "userByEmail": "signalsmith-reader@my-company-analytics.iam.gserviceaccount.com"
    }
  ]
}

Step 3: Create and Download the Key

gcloud iam service-accounts keys create signalsmith-key.json \
  --iam-account=signalsmith-reader@my-company-analytics.iam.gserviceaccount.com

This generates a JSON file containing the service account credentials. The file looks like:

{
  "type": "service_account",
  "project_id": "my-company-analytics",
  "private_key_id": "key-id-here",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "signalsmith-reader@my-company-analytics.iam.gserviceaccount.com",
  "client_id": "123456789",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}

Step 4: Provide the Key to SignalSmith

In the SignalSmith UI, paste the entire contents of the JSON key file into the Service Account JSON field. SignalSmith encrypts and securely stores this key.

Data Types

BigQuery has some unique data types that SignalSmith handles:

BigQuery Type	SignalSmith Handling
`STRING`	Mapped as text
`INT64`, `FLOAT64`, `NUMERIC`	Mapped as number
`BOOL`	Mapped as boolean
`TIMESTAMP`, `DATETIME`, `DATE`	Mapped as date/datetime
`JSON`	Mapped as JSON object; uses `PARSE_JSON()` for writing
`ARRAY`	Supported in model queries; flattened for sync
`STRUCT`	Supported in model queries; flattened for sync
`GEOGRAPHY`	Supported as text (WKT format)

Cross-Dataset Queries

Models can query tables across multiple datasets within the same project, or even across projects:

-- Same project, different dataset
SELECT u.user_id, u.email, o.total_amount
FROM `customer_data.users` u
JOIN `transactions.orders` o ON u.user_id = o.user_id
 
-- Cross-project query (service account must have access to both projects)
SELECT *
FROM `other-project.dataset.table`

Ensure the service account has bigquery.dataViewer access to all referenced datasets.

Example Configuration

curl -X POST https://your-workspace.signalsmith.dev/api/v1/sources \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Analytics BigQuery",
    "type": "bigquery",
    "config": {
      "project_id": "my-company-analytics",
      "dataset": "customer_data",
      "credentials_json": "{\"type\":\"service_account\",\"project_id\":\"my-company-analytics\",...}"
    }
  }'

Cost Considerations

BigQuery charges based on the amount of data scanned by queries. To manage costs:

Use partitioned tables — Partition on date columns and filter by partition in your model SQL to reduce data scanned
Use clustered tables — Clustering on frequently filtered columns improves query performance and reduces cost
Limit preview queries — Model previews run your SQL with a LIMIT clause, but wide SELECT * queries still scan all columns
Monitor with BigQuery audit logs — Track queries from the SignalSmith service account in Cloud Logging

-- Cost-efficient model: uses partition filter
SELECT customer_id, email, lifetime_value
FROM `customer_data.customers`
WHERE _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)

Troubleshooting

Issue	Solution
”Service account key is invalid”	Ensure you pasted the complete JSON file contents, including all fields
”Permission denied on dataset”	Grant `bigquery.dataViewer` role to the service account on the target dataset
”BigQuery API not enabled”	Enable the BigQuery API in APIs & Services > Enabled APIs in GCP Console
”Quota exceeded”	Check BigQuery quotas in GCP Console; consider requesting a quota increase
”Dataset not found”	Verify the dataset name is correct (case-sensitive) and exists in the specified project
”Access Denied: Project not found”	Verify the project ID (not the display name or number)

Next Steps

Create a model using your BigQuery warehouse
Use the SQL editor to write and preview queries
Set up a sync to activate your data

Snowflake Databricks