LoadersGoogle Sheets

Google Sheets Loader

The Google Sheets loader pulls spreadsheet data from Google Sheets into your data warehouse. It allows you to use Google Sheets as a lightweight data source for reference tables, manual data entry, or ad hoc datasets that don’t live in a traditional database.

Prerequisites

  • A Google account with access to the spreadsheets you want to sync
  • The spreadsheet must be organized with headers in the first row and data in subsequent rows
  • A connected Warehouse (target warehouse) with write permissions on the target schema

Authentication

The Google Sheets loader uses OAuth 2.0 for authentication.

OAuth 2.0 Setup

  1. In SignalSmith, click Add Loader and select Google Sheets
  2. Click Connect with Google
  3. You’ll be redirected to Google’s sign-in page
  4. Sign in with the Google account that has access to your spreadsheets
  5. Grant SignalSmith permission to read your Google Sheets data
  6. You’ll be redirected back to SignalSmith with the connection established

SignalSmith requests the following OAuth scope:

ScopePurpose
https://www.googleapis.com/auth/spreadsheets.readonlyRead-only access to Google Sheets
https://www.googleapis.com/auth/drive.readonlyRead-only access to discover spreadsheets in Google Drive

SignalSmith never modifies your spreadsheets. All access is strictly read-only.

Available Objects

Each spreadsheet tab (worksheet) is treated as a separate object. After connecting, you select which spreadsheets and tabs to sync.

ObjectDescriptionDefault Sync Mode
Spreadsheet TabA single worksheet within a Google Sheets spreadsheetFull Refresh

How Tabs Are Mapped

Each selected tab becomes a table in your warehouse:

  • The first row is treated as the header row — column names are derived from these values
  • All subsequent rows are treated as data rows
  • The table name is derived from the spreadsheet name and tab name (e.g., customer_segments_sheet1)

You can override the target table name during configuration.

Spreadsheet Selection

You can add spreadsheets by:

  • URL — Paste the Google Sheets URL directly (e.g., https://docs.google.com/spreadsheets/d/1BxiM...)
  • Browse — Search and select from spreadsheets in your Google Drive
  • Spreadsheet ID — Enter the spreadsheet ID from the URL

After selecting a spreadsheet, SignalSmith shows all tabs in the spreadsheet. Select the tabs you want to sync.

Configuration

SettingDescriptionDefault
SpreadsheetsList of spreadsheet URLs or IDs— (required)
TabsWhich tabs to sync from each spreadsheetAll tabs
Header RowRow number to use as column headers1
Data Start RowRow number where data begins2
Named RangeOptional named range to limit extraction (e.g., A1:F100)Entire sheet
Target SchemaWarehouse schema for Google Sheets tables— (required)
Table PrefixOptional prefix for table namesgsheet_
ScheduleSync frequencyDaily

Column Type Detection

SignalSmith automatically detects column types from the data:

Detected PatternWarehouse Type
Integers (e.g., 42, 1000)BIGINT
Decimals (e.g., 3.14, 99.99)DOUBLE
Dates (e.g., 2025-01-15, 1/15/2025)DATE
Timestamps (e.g., 2025-01-15 10:30:00)TIMESTAMP
Booleans (e.g., TRUE, FALSE, yes, no)BOOLEAN
Everything elseVARCHAR

If a column contains mixed types, it defaults to VARCHAR. You can override column types during configuration.

Scheduling Notes

  • Full Refresh only: Google Sheets does not have a change tracking mechanism, so every sync is a full refresh. The entire sheet is re-extracted and the warehouse table is replaced.
  • Rate limits: The Google Sheets API allows 60 requests per minute per user. Each tab extraction consumes one request. SignalSmith handles rate limiting automatically.
  • Sheet size limits: Google Sheets has a limit of 10 million cells per spreadsheet. For very large sheets (100,000+ rows), extraction may take several minutes.
  • Formulas: SignalSmith reads the computed values of cells, not the formulas. If a cell contains =SUM(A1:A10), the loader extracts the calculated result.
  • Formatting: Cell formatting (colors, fonts, borders) is ignored. Only cell values are extracted.
  • Empty rows: Trailing empty rows at the bottom of the sheet are automatically trimmed. Empty rows in the middle of data are preserved.
  • Concurrent edits: If the spreadsheet is being edited during a sync, the loader reads a point-in-time snapshot. Changes made during extraction are not partially captured.

Common Use Cases

Use CaseDescription
Reference tablesMapping tables, segment definitions, or lookup data maintained by business teams
Manual data entryData collected manually that doesn’t exist in any API (e.g., event attendance, partnership tiers)
Seed listsEmail lists, account lists, or audience seeds for targeting
ConfigurationDynamic configuration values that change more frequently than code deploys
Ad hoc analysisTemporary datasets for one-off analysis or testing

Troubleshooting

IssueSolution
”Spreadsheet not found”Verify the spreadsheet URL or ID is correct. Ensure the authenticated Google account has access
”Unable to parse header row”The first row must contain non-empty, unique column headers. Remove or rename duplicate headers
Column types are wrongOverride column types in the loader configuration. Mixed-type columns default to VARCHAR
”Quota exceeded”Google Sheets API rate limit. Reduce the number of tabs or sync frequency
Empty table after syncVerify the spreadsheet has data below the header row. Check the “Data Start Row” setting
Special characters in headersSignalSmith sanitizes header names for warehouse compatibility (spaces become underscores, special characters are removed)
Sheet with merged cellsMerged cells may cause unexpected behavior. Unmerge cells before syncing for best results
Very slow extractionLarge spreadsheets (100K+ rows) may take minutes. Consider using a named range to limit the data extracted

Next Steps

  • Create a model using your Google Sheets data
  • Join spreadsheet reference data with CRM or product data in your warehouse
  • Build audiences using seed lists from Google Sheets