WarehousesBigQuery

BigQuery

This guide covers how to configure Google BigQuery as a warehouse in SignalSmith, including project setup, service account creation, and required IAM permissions.

Prerequisites

  • A Google Cloud project with BigQuery enabled
  • A service account with read access to the target dataset(s)
  • The BigQuery API enabled in your GCP project

Connection Configuration

Required Fields

FieldDescriptionExample
Project IDThe Google Cloud project containing your BigQuery datasetsmy-company-analytics
DatasetThe default BigQuery dataset to querycustomer_data
Service Account JSONThe full JSON key file for the service account(see below)

Optional Fields

FieldDescriptionDefault
LocationThe BigQuery dataset locationAuto-detected

Project ID

The project ID is the unique identifier for your Google Cloud project. You can find it in the GCP Console under IAM & Admin > Settings, or in the project selector dropdown.

Note: Use the project ID (e.g., my-company-analytics), not the project name (e.g., “My Company Analytics”) or project number (e.g., 123456789012).

Dataset

The default dataset that SignalSmith will query. Models can reference tables in other datasets using fully-qualified names (e.g., project.dataset.table), but the default dataset is used when table names are unqualified.

BigQuery dataset names are case-sensitive. SignalSmith normalizes identifiers to lowercase to match BigQuery conventions when using backtick-quoted identifiers.

Service Account Setup

SignalSmith authenticates with BigQuery using a GCP service account. Follow these steps to create one.

Step 1: Create the Service Account

# Using gcloud CLI
gcloud iam service-accounts create signalsmith-reader \
  --project=my-company-analytics \
  --display-name="SignalSmith Reader" \
  --description="Read-only access for SignalSmith CDP"

Or in the GCP Console:

  1. Go to IAM & Admin > Service Accounts
  2. Click Create Service Account
  3. Name it signalsmith-reader
  4. Click Create and Continue

Step 2: Grant Required Roles

The service account needs the following IAM roles:

RolePurpose
roles/bigquery.dataViewerRead access to tables and views in the dataset
roles/bigquery.jobUserPermission to run BigQuery jobs (queries)
# Grant BigQuery Data Viewer on the dataset
gcloud projects add-iam-policy-binding my-company-analytics \
  --member="serviceAccount:signalsmith-reader@my-company-analytics.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"
 
# Grant BigQuery Job User (needed to run queries)
gcloud projects add-iam-policy-binding my-company-analytics \
  --member="serviceAccount:signalsmith-reader@my-company-analytics.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

For more granular control, you can grant roles/bigquery.dataViewer at the dataset level instead of the project level:

# Dataset-level access (more restrictive)
bq update \
  --source=dataset_policy.json \
  my-company-analytics:customer_data

Where dataset_policy.json includes:

{
  "access": [
    {
      "role": "READER",
      "userByEmail": "signalsmith-reader@my-company-analytics.iam.gserviceaccount.com"
    }
  ]
}

Step 3: Create and Download the Key

gcloud iam service-accounts keys create signalsmith-key.json \
  --iam-account=signalsmith-reader@my-company-analytics.iam.gserviceaccount.com

This generates a JSON file containing the service account credentials. The file looks like:

{
  "type": "service_account",
  "project_id": "my-company-analytics",
  "private_key_id": "key-id-here",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "signalsmith-reader@my-company-analytics.iam.gserviceaccount.com",
  "client_id": "123456789",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}

Step 4: Provide the Key to SignalSmith

In the SignalSmith UI, paste the entire contents of the JSON key file into the Service Account JSON field. SignalSmith encrypts and securely stores this key.

Data Types

BigQuery has some unique data types that SignalSmith handles:

BigQuery TypeSignalSmith Handling
STRINGMapped as text
INT64, FLOAT64, NUMERICMapped as number
BOOLMapped as boolean
TIMESTAMP, DATETIME, DATEMapped as date/datetime
JSONMapped as JSON object; uses PARSE_JSON() for writing
ARRAYSupported in model queries; flattened for sync
STRUCTSupported in model queries; flattened for sync
GEOGRAPHYSupported as text (WKT format)

Cross-Dataset Queries

Models can query tables across multiple datasets within the same project, or even across projects:

-- Same project, different dataset
SELECT u.user_id, u.email, o.total_amount
FROM `customer_data.users` u
JOIN `transactions.orders` o ON u.user_id = o.user_id
 
-- Cross-project query (service account must have access to both projects)
SELECT *
FROM `other-project.dataset.table`

Ensure the service account has bigquery.dataViewer access to all referenced datasets.

Example Configuration

curl -X POST https://your-workspace.signalsmith.dev/api/v1/sources \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Analytics BigQuery",
    "type": "bigquery",
    "config": {
      "project_id": "my-company-analytics",
      "dataset": "customer_data",
      "credentials_json": "{\"type\":\"service_account\",\"project_id\":\"my-company-analytics\",...}"
    }
  }'

Cost Considerations

BigQuery charges based on the amount of data scanned by queries. To manage costs:

  • Use partitioned tables — Partition on date columns and filter by partition in your model SQL to reduce data scanned
  • Use clustered tables — Clustering on frequently filtered columns improves query performance and reduces cost
  • Limit preview queries — Model previews run your SQL with a LIMIT clause, but wide SELECT * queries still scan all columns
  • Monitor with BigQuery audit logs — Track queries from the SignalSmith service account in Cloud Logging
-- Cost-efficient model: uses partition filter
SELECT customer_id, email, lifetime_value
FROM `customer_data.customers`
WHERE _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)

Troubleshooting

IssueSolution
”Service account key is invalid”Ensure you pasted the complete JSON file contents, including all fields
”Permission denied on dataset”Grant bigquery.dataViewer role to the service account on the target dataset
”BigQuery API not enabled”Enable the BigQuery API in APIs & Services > Enabled APIs in GCP Console
”Quota exceeded”Check BigQuery quotas in GCP Console; consider requesting a quota increase
”Dataset not found”Verify the dataset name is correct (case-sensitive) and exists in the specified project
”Access Denied: Project not found”Verify the project ID (not the display name or number)

Next Steps