BigQuery
This guide covers how to configure Google BigQuery as a warehouse in SignalSmith, including project setup, service account creation, and required IAM permissions.
Prerequisites
- A Google Cloud project with BigQuery enabled
- A service account with read access to the target dataset(s)
- The BigQuery API enabled in your GCP project
Connection Configuration
Required Fields
| Field | Description | Example |
|---|---|---|
| Project ID | The Google Cloud project containing your BigQuery datasets | my-company-analytics |
| Dataset | The default BigQuery dataset to query | customer_data |
| Service Account JSON | The full JSON key file for the service account | (see below) |
Optional Fields
| Field | Description | Default |
|---|---|---|
| Location | The BigQuery dataset location | Auto-detected |
Project ID
The project ID is the unique identifier for your Google Cloud project. You can find it in the GCP Console under IAM & Admin > Settings, or in the project selector dropdown.
Note: Use the project ID (e.g., my-company-analytics), not the project name (e.g., “My Company Analytics”) or project number (e.g., 123456789012).
Dataset
The default dataset that SignalSmith will query. Models can reference tables in other datasets using fully-qualified names (e.g., project.dataset.table), but the default dataset is used when table names are unqualified.
BigQuery dataset names are case-sensitive. SignalSmith normalizes identifiers to lowercase to match BigQuery conventions when using backtick-quoted identifiers.
Service Account Setup
SignalSmith authenticates with BigQuery using a GCP service account. Follow these steps to create one.
Step 1: Create the Service Account
# Using gcloud CLI
gcloud iam service-accounts create signalsmith-reader \
--project=my-company-analytics \
--display-name="SignalSmith Reader" \
--description="Read-only access for SignalSmith CDP"Or in the GCP Console:
- Go to IAM & Admin > Service Accounts
- Click Create Service Account
- Name it
signalsmith-reader - Click Create and Continue
Step 2: Grant Required Roles
The service account needs the following IAM roles:
| Role | Purpose |
|---|---|
roles/bigquery.dataViewer | Read access to tables and views in the dataset |
roles/bigquery.jobUser | Permission to run BigQuery jobs (queries) |
# Grant BigQuery Data Viewer on the dataset
gcloud projects add-iam-policy-binding my-company-analytics \
--member="serviceAccount:signalsmith-reader@my-company-analytics.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer"
# Grant BigQuery Job User (needed to run queries)
gcloud projects add-iam-policy-binding my-company-analytics \
--member="serviceAccount:signalsmith-reader@my-company-analytics.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"For more granular control, you can grant roles/bigquery.dataViewer at the dataset level instead of the project level:
# Dataset-level access (more restrictive)
bq update \
--source=dataset_policy.json \
my-company-analytics:customer_dataWhere dataset_policy.json includes:
{
"access": [
{
"role": "READER",
"userByEmail": "signalsmith-reader@my-company-analytics.iam.gserviceaccount.com"
}
]
}Step 3: Create and Download the Key
gcloud iam service-accounts keys create signalsmith-key.json \
--iam-account=signalsmith-reader@my-company-analytics.iam.gserviceaccount.comThis generates a JSON file containing the service account credentials. The file looks like:
{
"type": "service_account",
"project_id": "my-company-analytics",
"private_key_id": "key-id-here",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "signalsmith-reader@my-company-analytics.iam.gserviceaccount.com",
"client_id": "123456789",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}Step 4: Provide the Key to SignalSmith
In the SignalSmith UI, paste the entire contents of the JSON key file into the Service Account JSON field. SignalSmith encrypts and securely stores this key.
Data Types
BigQuery has some unique data types that SignalSmith handles:
| BigQuery Type | SignalSmith Handling |
|---|---|
STRING | Mapped as text |
INT64, FLOAT64, NUMERIC | Mapped as number |
BOOL | Mapped as boolean |
TIMESTAMP, DATETIME, DATE | Mapped as date/datetime |
JSON | Mapped as JSON object; uses PARSE_JSON() for writing |
ARRAY | Supported in model queries; flattened for sync |
STRUCT | Supported in model queries; flattened for sync |
GEOGRAPHY | Supported as text (WKT format) |
Cross-Dataset Queries
Models can query tables across multiple datasets within the same project, or even across projects:
-- Same project, different dataset
SELECT u.user_id, u.email, o.total_amount
FROM `customer_data.users` u
JOIN `transactions.orders` o ON u.user_id = o.user_id
-- Cross-project query (service account must have access to both projects)
SELECT *
FROM `other-project.dataset.table`Ensure the service account has bigquery.dataViewer access to all referenced datasets.
Example Configuration
curl -X POST https://your-workspace.signalsmith.dev/api/v1/sources \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Analytics BigQuery",
"type": "bigquery",
"config": {
"project_id": "my-company-analytics",
"dataset": "customer_data",
"credentials_json": "{\"type\":\"service_account\",\"project_id\":\"my-company-analytics\",...}"
}
}'Cost Considerations
BigQuery charges based on the amount of data scanned by queries. To manage costs:
- Use partitioned tables — Partition on date columns and filter by partition in your model SQL to reduce data scanned
- Use clustered tables — Clustering on frequently filtered columns improves query performance and reduces cost
- Limit preview queries — Model previews run your SQL with a
LIMITclause, but wideSELECT *queries still scan all columns - Monitor with BigQuery audit logs — Track queries from the SignalSmith service account in Cloud Logging
-- Cost-efficient model: uses partition filter
SELECT customer_id, email, lifetime_value
FROM `customer_data.customers`
WHERE _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)Troubleshooting
| Issue | Solution |
|---|---|
| ”Service account key is invalid” | Ensure you pasted the complete JSON file contents, including all fields |
| ”Permission denied on dataset” | Grant bigquery.dataViewer role to the service account on the target dataset |
| ”BigQuery API not enabled” | Enable the BigQuery API in APIs & Services > Enabled APIs in GCP Console |
| ”Quota exceeded” | Check BigQuery quotas in GCP Console; consider requesting a quota increase |
| ”Dataset not found” | Verify the dataset name is correct (case-sensitive) and exists in the specified project |
| ”Access Denied: Project not found” | Verify the project ID (not the display name or number) |
Next Steps
- Create a model using your BigQuery warehouse
- Use the SQL editor to write and preview queries
- Set up a sync to activate your data