Loaders
Loaders pull data from SaaS applications and third-party services into your data warehouse on a recurring schedule. Instead of building and maintaining custom ETL pipelines, you configure a loader in SignalSmith, and it handles extraction, schema mapping, and incremental synchronization automatically.
What Is a Loader?
A Loader is a managed ingestion pipeline that connects to a SaaS application’s API, extracts data from one or more objects (e.g., Salesforce Contacts, Stripe Subscriptions), and writes it into your data warehouse. Once the data is in your warehouse, it becomes available to Warehouses, Models, and the rest of the SignalSmith platform.
Loaders complement Warehouses. While a Warehouse gives SignalSmith read access to data already in your warehouse, a Loader brings external data into your warehouse in the first place.
Supported Connectors
SignalSmith supports 15+ loader connectors across CRM, marketing automation, advertising, payments, support, and productivity categories.
| Connector | Category | Authentication | Key Objects |
|---|---|---|---|
| Salesforce | CRM | OAuth 2.0 | Contacts, Leads, Accounts, Opportunities |
| HubSpot | CRM | OAuth 2.0 / API Key | Contacts, Companies, Deals |
| Stripe | Payments | API Key | Customers, Charges, Subscriptions |
| Zendesk | Support | API Token | Tickets, Users, Organizations |
| Intercom | Support | OAuth 2.0 | Contacts, Companies, Conversations |
| Marketo | Marketing | OAuth 2.0 | Leads, Lists, Programs, Activities |
| Google Ads | Advertising | OAuth 2.0 | Campaigns, Ad Groups, Performance |
| Facebook Ads | Advertising | OAuth 2.0 | Campaigns, Ad Sets, Insights |
| LinkedIn Ads | Advertising | OAuth 2.0 | Campaigns, Creatives, Analytics |
| Shopify | E-commerce | OAuth 2.0 | Orders, Products, Customers |
| GitHub | Developer | OAuth 2.0 / PAT | Repos, Issues, Pull Requests |
| Jira | Project Management | OAuth 2.0 / API Token | Issues, Projects, Sprints |
| Google Sheets | Productivity | OAuth 2.0 | Spreadsheet Tabs |
| Slack | Collaboration | OAuth 2.0 | Messages, Channels, Users |
How Loaders Work
1. Connect
Authenticate with the source application using OAuth 2.0, API keys, or access tokens. SignalSmith securely stores credentials and handles token refresh automatically.
2. Discover
Once connected, SignalSmith discovers the available objects and streams from the application’s API. You select which objects to sync — there’s no need to extract everything.
3. Map
SignalSmith automatically maps the source application’s schema to warehouse-compatible table definitions. Each selected object becomes a table in your target schema. You can customize column names, types, and which fields to include or exclude.
4. Schedule
Configure a sync schedule that determines how frequently data is pulled. Options range from every 15 minutes to daily. SignalSmith uses incremental sync by default, pulling only records that have changed since the last run.
5. Load
On each scheduled run, SignalSmith extracts changed records from the source API, transforms them into the target schema, and writes them to your warehouse. Failed runs are automatically retried, and you can monitor progress from the Loaders dashboard.
Sync Modes
Loaders support two primary sync modes:
| Mode | Description | Use Case |
|---|---|---|
| Full Refresh | Replaces the entire table with a fresh extract on each run | Small reference tables, lookup data, or when the source API doesn’t support change tracking |
| Incremental | Pulls only new and updated records since the last successful sync | Large transaction tables, event streams, or any dataset with a reliable updated_at timestamp |
Incremental sync uses a cursor field (usually a timestamp like updated_at or modified_date) to track progress. SignalSmith persists the cursor value between runs, so each execution picks up exactly where the last one left off.
Target Warehouse Configuration
Loader data is written to a schema in your connected data warehouse. You configure the target location when creating a loader:
| Setting | Description | Example |
|---|---|---|
| Source | The warehouse source connection to write into | Production Snowflake |
| Schema | The target schema for loader tables | SALESFORCE_RAW, HUBSPOT_DATA |
| Table Prefix | Optional prefix applied to all table names | sf_, hs_ |
SignalSmith creates tables automatically in the target schema. If tables already exist, the loader appends or merges data based on the sync mode.
Scheduling
Loaders run on configurable schedules:
| Interval | Description |
|---|---|
| Every 15 minutes | Near real-time for critical data |
| Hourly | Good balance of freshness and API usage |
| Every 6 hours | Suitable for most operational data |
| Daily | Best for reference data or high-volume extracts |
| Custom cron | Full cron expression for advanced scheduling |
All schedules are evaluated in UTC. You can pause and resume loaders at any time without losing cursor state.
Monitoring
Each loader run produces a detailed execution log:
- Status — Success, Failed, or Running
- Records extracted — Number of records pulled from the source API
- Records loaded — Number of records written to the warehouse
- Duration — Wall-clock time of the run
- Errors — Any API errors, rate limit hits, or schema conflicts
You can view run history from the Loaders dashboard or query the API for programmatic monitoring.
API Reference
Loaders are managed through the SignalSmith REST API:
# List all loaders
GET /api/v1/loaders
# Get a single loader
GET /api/v1/loaders/{id}
# Create a loader
POST /api/v1/loaders
# Update a loader
PUT /api/v1/loaders/{id}
# Delete a loader
DELETE /api/v1/loaders/{id}
# Trigger a manual run
POST /api/v1/loaders/{id}/run
# Get run history
GET /api/v1/loaders/{id}/runsSee the API Reference for full request/response schemas.
Best Practices
- Use a dedicated schema — Write loader data into a separate schema (e.g.,
SALESFORCE_RAW) to keep it isolated from your curated models and analytics tables. - Start with incremental sync — Incremental mode is faster, cheaper, and puts less load on both the source API and your warehouse.
- Monitor API quotas — Some source applications have API rate limits. If you’re loading many objects at high frequency, check that your API plan supports the volume.
- Schedule off-peak — For large full-refresh loads, schedule runs during off-peak hours to minimize impact on your warehouse.
- Use table prefixes — If multiple loaders write to the same schema, use prefixes to avoid naming collisions and make tables easy to identify.
Next Steps
- Create your first loader
- Choose a connector guide: Salesforce | HubSpot | Stripe | Zendesk