Webhooks let you trigger Python datasets in Definite docs from external systems with a simple HTTPS POST. Send JSON to our endpoint, and Definite will execute your doc’s Python datasets with access to your data. This enables real-time pipelines for events like meeting transcripts, signups, payments, orders, alerts, and more.
How it works
Create a doc with a Python dataset that processes webhook data.
Send a POST to the webhook endpoint with your doc ID and JSON payload.
Authenticate using your API key as a bearer token.
Execute: Definite spins up a sandbox, writes your payload as a file, and runs the Python dataset.
Respond: You get a response with per-dataset execution results.
Endpoint
POST /v4/webhook/docs/{doc_id}/execute
Where {doc_id} is the UUID of the doc containing your Python dataset(s).
Authentication
Authenticate with your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
API key format: {user_id}-{api_key_suffix}
Get your API key from the bottom-left user menu in the Definite app.
Request Body
{
"data": {
"event_type": "meeting_end",
"transcript": [
{"speaker": "Alice", "text": "Let's review Q1 results"}
],
"summary": "Team reviewed Q1 results."
},
"environmentVariables": {
"CUSTOM_VAR": "value"
},
"datasetKeys": ["process_webhook"]
}
| Field | Type | Description |
|---|
data | object | Arbitrary JSON payload. Written as a file in the sandbox, accessible via WEBHOOK_DATA_FILE env var. |
environmentVariables | object | Additional environment variables injected into the sandbox. |
datasetKeys | string[] | Execute only these datasets. Default: all Python datasets in the doc. |
Data injection
Your data payload is written as a JSON file inside the execution sandbox. The file path is available via the WEBHOOK_DATA_FILE environment variable. This approach supports large payloads (e.g., full meeting transcripts) without hitting environment variable size limits.
Response
The response is wrapped in a standard v4 envelope:
{
"success": true,
"data": {
"docId": "your-doc-uuid",
"docName": "Webhook Processor",
"results": {
"process_webhook": {
"success": true,
"executionId": "uuid-string",
"error": null
}
}
},
"meta": {
"requestId": "uuid",
"durationMs": 1234
}
}
Each entry in results corresponds to a Python dataset that was executed:
| Field | Description |
|---|
success | Whether the dataset executed without errors |
executionId | UUID for retrieving execution logs |
error | Error message if execution failed |
Reading webhook data in Python
import json, os
# Read the webhook payload from the file
data = json.loads(open(os.environ["WEBHOOK_DATA_FILE"]).read())
event_type = data.get("event_type")
transcript = data.get("transcript", [])
print(f"Processing {event_type} with {len(transcript)} turns")
# Access additional environment variables
custom_var = os.environ.get("CUSTOM_VAR")
Both DEFINITE_API_KEY and DEFINITE_API_BASE_URL are automatically available in the sandbox, so you can use the Definite SDK to write data back to DuckLake.
Example: Pipeline doc for webhook processing
Create a doc with a Python dataset that processes incoming webhook data:
version: 1
schemaVersion: "2025-01"
kind: pipeline
metadata:
name: "Transcript Processor"
datasets:
process_webhook:
engine: python
code: |
import json, os
from definite_sdk import DefiniteClient
import duckdb
# Read webhook payload
data = json.loads(open(os.environ["WEBHOOK_DATA_FILE"]).read())
# Set up DuckLake connection
client = DefiniteClient(
os.environ["DEFINITE_API_KEY"],
api_url=os.environ["DEFINITE_API_BASE_URL"]
)
conn = duckdb.connect()
conn.execute(client.attach_ducklake())
# Write to DuckLake
conn.execute("""
INSERT INTO lake.transcripts.meetings
VALUES (?, ?, ?, CURRENT_TIMESTAMP)
""", [data["session_id"], data["title"], json.dumps(data["transcript"])])
print(f"Ingested meeting: {data['title']}")
timeoutMs: 120000
Examples
Basic webhook call
curl -X POST https://api.definite.app/v4/webhook/docs/YOUR_DOC_ID/execute \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"data": {
"event_type": "user.created",
"user_id": "123",
"email": "user@example.com"
}
}'
With environment variables and dataset filter
curl -X POST https://api.definite.app/v4/webhook/docs/YOUR_DOC_ID/execute \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"data": {
"event_type": "payment.succeeded",
"amount": 99.99
},
"environmentVariables": {
"STRIPE_KEY": "sk_test_123"
},
"datasetKeys": ["process_payment"]
}'
Python client
import httpx
response = httpx.post(
"https://api.definite.app/v4/webhook/docs/YOUR_DOC_ID/execute",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"data": {
"session_id": "abc-123",
"title": "Weekly Sync",
"transcript": [
{"speaker": "Alice", "text": "Let's review Q1"},
{"speaker": "Bob", "text": "Revenue is up 20%"},
],
}
},
)
print(response.json())
Common use cases
- Meeting transcript ingestion: Receive webhooks from Read.ai, Otter.ai, or similar services
- Event processing: Payments, signups, order lifecycle events
- Streaming ingestion: Telemetry, IoT, monitoring alerts
- Workflow automation: Trigger transformations or enrichment on external events
- Third-party callbacks: Process responses from external integrations
Best practices
- Use bearer auth in the
Authorization header.
- Use
datasetKeys to target specific datasets when your doc has multiple.
- Include an idempotency key (e.g.,
session_id in data) if your sender may retry.
- Separate secrets from data: Pass secrets via
environmentVariables, not in data.
- Log
executionId from responses to trace runs in Definite.
- Set
timeoutMs on your Python dataset (default is 6s, which is too short for most webhook processing).
Troubleshooting
| Status | Cause | Fix |
|---|
| 401 | Invalid or missing API key | Check API key format and Authorization header |
| 404 | Doc not found, archived, or belongs to a different team | Verify doc ID and that your API key has access |
| 422 | Invalid request body | Validate JSON structure |
| 5xx | Server error | Retry with exponential backoff |