Creating a Loader
This guide walks you through setting up a new loader in SignalSmith. A loader connects to a SaaS application, extracts data from selected objects, and writes it into your data warehouse on a recurring schedule.
Prerequisites
Before creating a loader, ensure you have:
- A SignalSmith workspace with appropriate permissions (Owner, Admin, or a role with
loaders:createpermission) - A configured Warehouse (the target warehouse where loader data will be written)
- Credentials for the SaaS application you want to connect (OAuth access or API key)
- Write permissions on the target schema in your warehouse
Step-by-Step Guide
Step 1: Navigate to Loaders
- Log in to your SignalSmith workspace
- Click Loaders in the left sidebar
- Click the Add Loader button in the top-right corner
Step 2: Select the Source Application
Choose the SaaS application you want to pull data from. SignalSmith supports 15+ connectors across CRM, marketing, advertising, payments, support, and productivity categories.
Each connector has its own authentication flow and available objects. See the individual connector guides for detailed setup instructions.
Step 3: Authenticate
Depending on the connector, you’ll authenticate using one of the following methods:
| Method | Flow | Connectors |
|---|---|---|
| OAuth 2.0 | Click “Connect” and authorize via the application’s login page. SignalSmith handles token storage and refresh. | Salesforce, HubSpot, Google Ads, Facebook Ads, LinkedIn Ads, Shopify, Intercom, GitHub, Slack |
| API Key | Paste your API key or secret directly into the configuration form. | Stripe, HubSpot (alternative), Zendesk |
| OAuth 2.0 Client Credentials | Provide your client ID and client secret. SignalSmith exchanges them for an access token. | Marketo |
| API Token | Provide a personal or workspace API token along with your account identifier. | Zendesk, Jira |
| Personal Access Token (PAT) | Generate a token in the application’s developer settings and paste it into SignalSmith. | GitHub (alternative), Jira (alternative) |
All credentials are encrypted at rest using AES-256 encryption.
Step 4: Discover and Select Objects
After authentication, SignalSmith discovers the available objects and streams from the connected application. You’ll see a list of all available objects with metadata:
- Object name — The API name of the object (e.g.,
Contact,deals,charges) - Record count — Estimated number of records (when available from the API)
- Sync mode — Whether the object supports incremental sync or requires full refresh
- Cursor field — The field used for incremental sync (e.g.,
updated_at,SystemModstamp)
Select the objects you want to sync. You don’t need to select everything — choose only the objects that are relevant to your use case to minimize API usage and warehouse storage.
For each selected object, you can optionally:
- Include/exclude fields — Deselect fields you don’t need to reduce table width and storage
- Rename the target table — Override the default table name in your warehouse
- Set primary key — Specify which field(s) uniquely identify a record for deduplication
Step 5: Configure the Target Warehouse
Specify where loader data should be written:
| Setting | Description | Example |
|---|---|---|
| Target Source | The warehouse source connection to write into | Production Snowflake |
| Target Schema | The schema where tables will be created | SALESFORCE_RAW |
| Table Prefix | Optional prefix for all table names created by this loader | sf_ |
SignalSmith creates tables automatically. If a table already exists, the loader merges data based on the sync mode and primary key configuration.
Step 6: Configure the Schedule
Set the sync frequency:
| Interval | Best For |
|---|---|
| Every 15 minutes | Critical operational data (e.g., support tickets, orders) |
| Hourly | General-purpose, balances freshness with API efficiency |
| Every 6 hours | Operational data that doesn’t need real-time freshness |
| Daily | Reference data, large full-refresh tables, cost-sensitive workloads |
| Custom cron | Advanced scheduling needs (e.g., 0 2 * * 1-5 for weekday 2 AM runs) |
All schedules are evaluated in UTC. You can also choose to leave the schedule paused and trigger runs manually.
Step 7: Name and Save
Give your loader a descriptive name (e.g., “Salesforce Production” or “Stripe Payments”) and click Save.
SignalSmith will:
- Create the target tables in your warehouse
- Run an initial full sync to backfill historical data
- Begin the recurring schedule for incremental syncs
The initial sync may take longer depending on data volume. You can monitor progress from the loader’s detail page.
Using the API
You can create loaders programmatically via the REST API:
curl -X POST https://your-workspace.signalsmith.dev/api/v1/loaders \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Salesforce Production",
"connector": "salesforce",
"auth": {
"type": "oauth2",
"access_token": "...",
"refresh_token": "...",
"instance_url": "https://mycompany.my.salesforce.com"
},
"objects": [
{
"name": "Contact",
"sync_mode": "incremental",
"cursor_field": "SystemModstamp",
"primary_key": ["Id"]
},
{
"name": "Account",
"sync_mode": "incremental",
"cursor_field": "SystemModstamp",
"primary_key": ["Id"]
}
],
"target": {
"source_id": "src_abc123",
"schema": "SALESFORCE_RAW",
"table_prefix": "sf_"
},
"schedule": {
"interval": "hourly"
}
}'The API response includes the created loader with its id and initial run status:
{
"id": "ldr_xyz789",
"name": "Salesforce Production",
"connector": "salesforce",
"status": "running",
"objects": ["Contact", "Account"],
"schedule": {
"interval": "hourly",
"next_run_at": "2025-01-15T12:00:00Z"
},
"created_at": "2025-01-15T11:00:00Z"
}Managing Loaders
Editing a Loader
To modify an existing loader:
- Navigate to Loaders in the sidebar
- Click on the loader you want to edit
- Modify objects, schedule, or target configuration
- Click Save
Adding new objects triggers a backfill for those objects. Removing objects does not delete the corresponding warehouse tables — you must drop them manually if desired.
Pausing and Resuming
You can pause a loader to temporarily stop scheduled runs without losing cursor state. Click Pause on the loader’s detail page. Resume when ready — the next run picks up exactly where it left off.
Manual Runs
Click Run Now to trigger an immediate sync outside the regular schedule. This is useful for testing configuration changes or backfilling after a pause.
Deleting a Loader
Deleting a loader stops all scheduled runs and removes the loader configuration. Existing data in your warehouse is not affected — tables and data remain until you manually clean them up.
Common Issues
| Issue | Solution |
|---|---|
| OAuth token expired | Re-authenticate by clicking “Reconnect” on the loader detail page |
| API rate limit exceeded | Reduce sync frequency or select fewer objects |
| Target schema doesn’t exist | Create the schema in your warehouse before saving the loader |
| ”Permission denied” writing to warehouse | Ensure the source connection user has write access to the target schema |
| Initial sync taking too long | Large datasets may take hours for the initial backfill — subsequent incremental syncs will be much faster |
| Missing records after sync | Verify the cursor field is correctly set and that the source API returns expected data for the date range |
Next Steps
- Choose a connector: Salesforce | HubSpot | Stripe | Zendesk
- Monitor your data pipeline from the Loaders dashboard
- Create a model using the loaded data