Loaders

Loaders pull data from SaaS applications and third-party services into your data warehouse on a recurring schedule. Instead of building and maintaining custom ETL pipelines, you configure a loader in SignalSmith, and it handles extraction, schema mapping, and incremental synchronization automatically.

What Is a Loader?

A Loader is a managed ingestion pipeline that connects to a SaaS application’s API, extracts data from one or more objects (e.g., Salesforce Contacts, Stripe Subscriptions), and writes it into your data warehouse. Once the data is in your warehouse, it becomes available to Warehouses, Models, and the rest of the SignalSmith platform.

Loaders complement Warehouses. While a Warehouse gives SignalSmith read access to data already in your warehouse, a Loader brings external data into your warehouse in the first place.

Supported Connectors

SignalSmith supports 15+ loader connectors across CRM, marketing automation, advertising, payments, support, and productivity categories.

Connector	Category	Authentication	Key Objects
Salesforce	CRM	OAuth 2.0	Contacts, Leads, Accounts, Opportunities
HubSpot	CRM	OAuth 2.0 / API Key	Contacts, Companies, Deals
Stripe	Payments	API Key	Customers, Charges, Subscriptions
Zendesk	Support	API Token	Tickets, Users, Organizations
Intercom	Support	OAuth 2.0	Contacts, Companies, Conversations
Marketo	Marketing	OAuth 2.0	Leads, Lists, Programs, Activities
Google Ads	Advertising	OAuth 2.0	Campaigns, Ad Groups, Performance
Facebook Ads	Advertising	OAuth 2.0	Campaigns, Ad Sets, Insights
LinkedIn Ads	Advertising	OAuth 2.0	Campaigns, Creatives, Analytics
Shopify	E-commerce	OAuth 2.0	Orders, Products, Customers
GitHub	Developer	OAuth 2.0 / PAT	Repos, Issues, Pull Requests
Jira	Project Management	OAuth 2.0 / API Token	Issues, Projects, Sprints
Google Sheets	Productivity	OAuth 2.0	Spreadsheet Tabs
Slack	Collaboration	OAuth 2.0	Messages, Channels, Users

How Loaders Work

1. Connect

Authenticate with the source application using OAuth 2.0, API keys, or access tokens. SignalSmith securely stores credentials and handles token refresh automatically.

2. Discover

Once connected, SignalSmith discovers the available objects and streams from the application’s API. You select which objects to sync — there’s no need to extract everything.

3. Map

SignalSmith automatically maps the source application’s schema to warehouse-compatible table definitions. Each selected object becomes a table in your target schema. You can customize column names, types, and which fields to include or exclude.

4. Schedule

Configure a sync schedule that determines how frequently data is pulled. Options range from every 15 minutes to daily. SignalSmith uses incremental sync by default, pulling only records that have changed since the last run.

5. Load

On each scheduled run, SignalSmith extracts changed records from the source API, transforms them into the target schema, and writes them to your warehouse. Failed runs are automatically retried, and you can monitor progress from the Loaders dashboard.

Sync Modes

Loaders support two primary sync modes:

Mode	Description	Use Case
Full Refresh	Replaces the entire table with a fresh extract on each run	Small reference tables, lookup data, or when the source API doesn’t support change tracking
Incremental	Pulls only new and updated records since the last successful sync	Large transaction tables, event streams, or any dataset with a reliable `updated_at` timestamp

Incremental sync uses a cursor field (usually a timestamp like updated_at or modified_date) to track progress. SignalSmith persists the cursor value between runs, so each execution picks up exactly where the last one left off.

Target Warehouse Configuration

Loader data is written to a schema in your connected data warehouse. You configure the target location when creating a loader:

Setting	Description	Example
Source	The warehouse source connection to write into	`Production Snowflake`
Schema	The target schema for loader tables	`SALESFORCE_RAW`, `HUBSPOT_DATA`
Table Prefix	Optional prefix applied to all table names	`sf_`, `hs_`

SignalSmith creates tables automatically in the target schema. If tables already exist, the loader appends or merges data based on the sync mode.

Scheduling

Loaders run on configurable schedules:

Interval	Description
Every 15 minutes	Near real-time for critical data
Hourly	Good balance of freshness and API usage
Every 6 hours	Suitable for most operational data
Daily	Best for reference data or high-volume extracts
Custom cron	Full cron expression for advanced scheduling

All schedules are evaluated in UTC. You can pause and resume loaders at any time without losing cursor state.

Monitoring

Each loader run produces a detailed execution log:

Status — Success, Failed, or Running
Records extracted — Number of records pulled from the source API
Records loaded — Number of records written to the warehouse
Duration — Wall-clock time of the run
Errors — Any API errors, rate limit hits, or schema conflicts

You can view run history from the Loaders dashboard or query the API for programmatic monitoring.

API Reference

Loaders are managed through the SignalSmith REST API:

# List all loaders
GET /api/v1/loaders
 
# Get a single loader
GET /api/v1/loaders/{id}
 
# Create a loader
POST /api/v1/loaders
 
# Update a loader
PUT /api/v1/loaders/{id}
 
# Delete a loader
DELETE /api/v1/loaders/{id}
 
# Trigger a manual run
POST /api/v1/loaders/{id}/run
 
# Get run history
GET /api/v1/loaders/{id}/runs

See the API Reference for full request/response schemas.

Best Practices

Use a dedicated schema — Write loader data into a separate schema (e.g., SALESFORCE_RAW) to keep it isolated from your curated models and analytics tables.
Start with incremental sync — Incremental mode is faster, cheaper, and puts less load on both the source API and your warehouse.
Monitor API quotas — Some source applications have API rate limits. If you’re loading many objects at high frequency, check that your API plan supports the volume.
Schedule off-peak — For large full-refresh loads, schedule runs during off-peak hours to minimize impact on your warehouse.
Use table prefixes — If multiple loaders write to the same schema, use prefixes to avoid naming collisions and make tables easy to identify.

Next Steps

Create your first loader
Choose a connector guide: Salesforce | HubSpot | Stripe | Zendesk

Databricks Creating a Loader