Google Sheets Loader

The Google Sheets loader pulls spreadsheet data from Google Sheets into your data warehouse. It allows you to use Google Sheets as a lightweight data source for reference tables, manual data entry, or ad hoc datasets that don’t live in a traditional database.

Prerequisites

A Google account with access to the spreadsheets you want to sync
The spreadsheet must be organized with headers in the first row and data in subsequent rows
A connected Warehouse (target warehouse) with write permissions on the target schema

Authentication

The Google Sheets loader uses OAuth 2.0 for authentication.

OAuth 2.0 Setup

In SignalSmith, click Add Loader and select Google Sheets
Click Connect with Google
You’ll be redirected to Google’s sign-in page
Sign in with the Google account that has access to your spreadsheets
Grant SignalSmith permission to read your Google Sheets data
You’ll be redirected back to SignalSmith with the connection established

SignalSmith requests the following OAuth scope:

Scope	Purpose
`https://www.googleapis.com/auth/spreadsheets.readonly`	Read-only access to Google Sheets
`https://www.googleapis.com/auth/drive.readonly`	Read-only access to discover spreadsheets in Google Drive

SignalSmith never modifies your spreadsheets. All access is strictly read-only.

Available Objects

Each spreadsheet tab (worksheet) is treated as a separate object. After connecting, you select which spreadsheets and tabs to sync.

Object	Description	Default Sync Mode
Spreadsheet Tab	A single worksheet within a Google Sheets spreadsheet	Full Refresh

How Tabs Are Mapped

Each selected tab becomes a table in your warehouse:

The first row is treated as the header row — column names are derived from these values
All subsequent rows are treated as data rows
The table name is derived from the spreadsheet name and tab name (e.g., customer_segments_sheet1)

You can override the target table name during configuration.

Spreadsheet Selection

You can add spreadsheets by:

URL — Paste the Google Sheets URL directly (e.g., https://docs.google.com/spreadsheets/d/1BxiM...)
Browse — Search and select from spreadsheets in your Google Drive
Spreadsheet ID — Enter the spreadsheet ID from the URL

After selecting a spreadsheet, SignalSmith shows all tabs in the spreadsheet. Select the tabs you want to sync.

Configuration

Setting	Description	Default
Spreadsheets	List of spreadsheet URLs or IDs	— (required)
Tabs	Which tabs to sync from each spreadsheet	All tabs
Header Row	Row number to use as column headers	`1`
Data Start Row	Row number where data begins	`2`
Named Range	Optional named range to limit extraction (e.g., `A1:F100`)	Entire sheet
Target Schema	Warehouse schema for Google Sheets tables	— (required)
Table Prefix	Optional prefix for table names	`gsheet_`
Schedule	Sync frequency	Daily

Column Type Detection

SignalSmith automatically detects column types from the data:

Detected Pattern	Warehouse Type
Integers (e.g., `42`, `1000`)	`BIGINT`
Decimals (e.g., `3.14`, `99.99`)	`DOUBLE`
Dates (e.g., `2025-01-15`, `1/15/2025`)	`DATE`
Timestamps (e.g., `2025-01-15 10:30:00`)	`TIMESTAMP`
Booleans (e.g., `TRUE`, `FALSE`, `yes`, `no`)	`BOOLEAN`
Everything else	`VARCHAR`

If a column contains mixed types, it defaults to VARCHAR. You can override column types during configuration.

Scheduling Notes

Full Refresh only: Google Sheets does not have a change tracking mechanism, so every sync is a full refresh. The entire sheet is re-extracted and the warehouse table is replaced.
Rate limits: The Google Sheets API allows 60 requests per minute per user. Each tab extraction consumes one request. SignalSmith handles rate limiting automatically.
Sheet size limits: Google Sheets has a limit of 10 million cells per spreadsheet. For very large sheets (100,000+ rows), extraction may take several minutes.
Formulas: SignalSmith reads the computed values of cells, not the formulas. If a cell contains =SUM(A1:A10), the loader extracts the calculated result.
Formatting: Cell formatting (colors, fonts, borders) is ignored. Only cell values are extracted.
Empty rows: Trailing empty rows at the bottom of the sheet are automatically trimmed. Empty rows in the middle of data are preserved.
Concurrent edits: If the spreadsheet is being edited during a sync, the loader reads a point-in-time snapshot. Changes made during extraction are not partially captured.

Common Use Cases

Use Case	Description
Reference tables	Mapping tables, segment definitions, or lookup data maintained by business teams
Manual data entry	Data collected manually that doesn’t exist in any API (e.g., event attendance, partnership tiers)
Seed lists	Email lists, account lists, or audience seeds for targeting
Configuration	Dynamic configuration values that change more frequently than code deploys
Ad hoc analysis	Temporary datasets for one-off analysis or testing

Troubleshooting

Issue	Solution
”Spreadsheet not found”	Verify the spreadsheet URL or ID is correct. Ensure the authenticated Google account has access
”Unable to parse header row”	The first row must contain non-empty, unique column headers. Remove or rename duplicate headers
Column types are wrong	Override column types in the loader configuration. Mixed-type columns default to VARCHAR
”Quota exceeded”	Google Sheets API rate limit. Reduce the number of tabs or sync frequency
Empty table after sync	Verify the spreadsheet has data below the header row. Check the “Data Start Row” setting
Special characters in headers	SignalSmith sanitizes header names for warehouse compatibility (spaces become underscores, special characters are removed)
Sheet with merged cells	Merged cells may cause unexpected behavior. Unmerge cells before syncing for best results
Very slow extraction	Large spreadsheets (100K+ rows) may take minutes. Consider using a named range to limit the data extracted

Next Steps

Create a model using your Google Sheets data
Join spreadsheet reference data with CRM or product data in your warehouse
Build audiences using seed lists from Google Sheets

Jira Slack