Google Sheets Loader
The Google Sheets loader pulls spreadsheet data from Google Sheets into your data warehouse. It allows you to use Google Sheets as a lightweight data source for reference tables, manual data entry, or ad hoc datasets that don’t live in a traditional database.
Prerequisites
- A Google account with access to the spreadsheets you want to sync
- The spreadsheet must be organized with headers in the first row and data in subsequent rows
- A connected Warehouse (target warehouse) with write permissions on the target schema
Authentication
The Google Sheets loader uses OAuth 2.0 for authentication.
OAuth 2.0 Setup
- In SignalSmith, click Add Loader and select Google Sheets
- Click Connect with Google
- You’ll be redirected to Google’s sign-in page
- Sign in with the Google account that has access to your spreadsheets
- Grant SignalSmith permission to read your Google Sheets data
- You’ll be redirected back to SignalSmith with the connection established
SignalSmith requests the following OAuth scope:
| Scope | Purpose |
|---|---|
https://www.googleapis.com/auth/spreadsheets.readonly | Read-only access to Google Sheets |
https://www.googleapis.com/auth/drive.readonly | Read-only access to discover spreadsheets in Google Drive |
SignalSmith never modifies your spreadsheets. All access is strictly read-only.
Available Objects
Each spreadsheet tab (worksheet) is treated as a separate object. After connecting, you select which spreadsheets and tabs to sync.
| Object | Description | Default Sync Mode |
|---|---|---|
| Spreadsheet Tab | A single worksheet within a Google Sheets spreadsheet | Full Refresh |
How Tabs Are Mapped
Each selected tab becomes a table in your warehouse:
- The first row is treated as the header row — column names are derived from these values
- All subsequent rows are treated as data rows
- The table name is derived from the spreadsheet name and tab name (e.g.,
customer_segments_sheet1)
You can override the target table name during configuration.
Spreadsheet Selection
You can add spreadsheets by:
- URL — Paste the Google Sheets URL directly (e.g.,
https://docs.google.com/spreadsheets/d/1BxiM...) - Browse — Search and select from spreadsheets in your Google Drive
- Spreadsheet ID — Enter the spreadsheet ID from the URL
After selecting a spreadsheet, SignalSmith shows all tabs in the spreadsheet. Select the tabs you want to sync.
Configuration
| Setting | Description | Default |
|---|---|---|
| Spreadsheets | List of spreadsheet URLs or IDs | — (required) |
| Tabs | Which tabs to sync from each spreadsheet | All tabs |
| Header Row | Row number to use as column headers | 1 |
| Data Start Row | Row number where data begins | 2 |
| Named Range | Optional named range to limit extraction (e.g., A1:F100) | Entire sheet |
| Target Schema | Warehouse schema for Google Sheets tables | — (required) |
| Table Prefix | Optional prefix for table names | gsheet_ |
| Schedule | Sync frequency | Daily |
Column Type Detection
SignalSmith automatically detects column types from the data:
| Detected Pattern | Warehouse Type |
|---|---|
Integers (e.g., 42, 1000) | BIGINT |
Decimals (e.g., 3.14, 99.99) | DOUBLE |
Dates (e.g., 2025-01-15, 1/15/2025) | DATE |
Timestamps (e.g., 2025-01-15 10:30:00) | TIMESTAMP |
Booleans (e.g., TRUE, FALSE, yes, no) | BOOLEAN |
| Everything else | VARCHAR |
If a column contains mixed types, it defaults to VARCHAR. You can override column types during configuration.
Scheduling Notes
- Full Refresh only: Google Sheets does not have a change tracking mechanism, so every sync is a full refresh. The entire sheet is re-extracted and the warehouse table is replaced.
- Rate limits: The Google Sheets API allows 60 requests per minute per user. Each tab extraction consumes one request. SignalSmith handles rate limiting automatically.
- Sheet size limits: Google Sheets has a limit of 10 million cells per spreadsheet. For very large sheets (100,000+ rows), extraction may take several minutes.
- Formulas: SignalSmith reads the computed values of cells, not the formulas. If a cell contains
=SUM(A1:A10), the loader extracts the calculated result. - Formatting: Cell formatting (colors, fonts, borders) is ignored. Only cell values are extracted.
- Empty rows: Trailing empty rows at the bottom of the sheet are automatically trimmed. Empty rows in the middle of data are preserved.
- Concurrent edits: If the spreadsheet is being edited during a sync, the loader reads a point-in-time snapshot. Changes made during extraction are not partially captured.
Common Use Cases
| Use Case | Description |
|---|---|
| Reference tables | Mapping tables, segment definitions, or lookup data maintained by business teams |
| Manual data entry | Data collected manually that doesn’t exist in any API (e.g., event attendance, partnership tiers) |
| Seed lists | Email lists, account lists, or audience seeds for targeting |
| Configuration | Dynamic configuration values that change more frequently than code deploys |
| Ad hoc analysis | Temporary datasets for one-off analysis or testing |
Troubleshooting
| Issue | Solution |
|---|---|
| ”Spreadsheet not found” | Verify the spreadsheet URL or ID is correct. Ensure the authenticated Google account has access |
| ”Unable to parse header row” | The first row must contain non-empty, unique column headers. Remove or rename duplicate headers |
| Column types are wrong | Override column types in the loader configuration. Mixed-type columns default to VARCHAR |
| ”Quota exceeded” | Google Sheets API rate limit. Reduce the number of tabs or sync frequency |
| Empty table after sync | Verify the spreadsheet has data below the header row. Check the “Data Start Row” setting |
| Special characters in headers | SignalSmith sanitizes header names for warehouse compatibility (spaces become underscores, special characters are removed) |
| Sheet with merged cells | Merged cells may cause unexpected behavior. Unmerge cells before syncing for best results |
| Very slow extraction | Large spreadsheets (100K+ rows) may take minutes. Consider using a named range to limit the data extracted |
Next Steps
- Create a model using your Google Sheets data
- Join spreadsheet reference data with CRM or product data in your warehouse
- Build audiences using seed lists from Google Sheets