How it works
Data lands in DuckLake
Definite writes your data to an Iceberg table with automatic schema handling.
Endpoint
POSThttps://api.definite.app/v2/stream
Authentication
Include your API key in the Authorization header:Request Body
Append
Merge (upsert)
Fields
| Field | Type | Required | Description |
|---|---|---|---|
data | object or array | Yes | Single record or array of records to ingest |
config.table | string | Yes | Target table in schema.table format (e.g., bronze.events) |
config.mode | string | No | Ingestion mode. append (default) or merge. merge requires primary_key. |
config.primary_key | array of string | Required when mode="merge" | List of exactly one column name to key the upsert on. Composite keys are not yet supported. |
config.wait | boolean | No | Wait for commit and return snapshot ID. Default: false |
config.tags | object | No | Optional metadata tags for tracing |
Ingestion modes
append
Inserts every row into the target table as-is. This is the default.
merge
A classic upsert/merge keyed on primary_key. Existing rows whose PK matches an incoming row are replaced; new rows are inserted. Use this when your source has a stable unique key.
Constraints:
primary_keymust be a list of exactly one column (v2 limitation; composite keys are planned).- Every incoming row must include the PK column with a non-null value.
- Within-batch duplicate PK values cause the whole batch to be rejected — dedupe upstream before sending.
Response
Response Fields
| Field | Description |
|---|---|
success | Whether the ingestion was successful |
request_id | Unique identifier for this request |
stream_id | Unique identifier for this stream |
table | Fully qualified table name |
accepted | Number of rows parsed and accepted |
successful_rows | Number of rows successfully written |
rejected_rows | Number of rows rejected due to validation |
partitions | Human-friendly partition summary |
snapshot_id | Iceberg snapshot ID (present when wait=true) |
warnings | Warning messages |
errors | Error messages |
Table format
config.table accepts two forms:
"namespace.table"(e.g.,"bronze.events")"LAKE.namespace.table"(e.g.,"LAKE.bronze.events")
404.
Type coercion
Incoming values are cast to each target column’s declared type at write time, so the wire format is forgiving:- Mixed types in one column — an
intand astringin the sameVARCHARcolumn both insert cleanly as strings. - Boolean caveat — Python
True/FalsebecomesTRUE/FALSEin aBOOLEANcolumn, but lowercase'true'/'false'in aVARCHARcolumn. Send a string literal if case matters.
Limits
| Parameter | Limit | Description |
|---|---|---|
| Max payload size | 10 MB | Maximum request body size |
| Max rows per request | 50,000 | Maximum number of records per request |
| Max field size | 1 MB | Maximum size of any individual field |
| Max nested depth | 10 | Maximum JSON nesting depth |
Content Types
The Stream API accepts:- JSON (
application/json) - Single object or array of objects - NDJSON (
application/x-ndjson) - Newline-delimited JSON
Compression
You can compress your payload to reduce transfer time:Examples
Python
Python merge (upsert)
Python with compression
cURL
cURL with NDJSON
Error Handling
HTTP Status Codes
| Status | Meaning |
|---|---|
200 | Success - data ingested |
400 | Bad request - invalid JSON or schema |
401 | Unauthorized - invalid or missing API key |
404 | Table not found - namespace or table does not exist |
413 | Payload too large - exceeds 10MB limit |
422 | Invalid request - e.g., mode="merge" without primary_key |
429 | Rate limited - too many requests |
500 | Server error - retry with backoff |
Retry Strategy
For transient errors (429, 5xx), implement exponential backoff:Best Practices
- Batch your data - Send multiple records per request (up to 50,000) rather than one at a time
- Use compression - For large payloads, enable gzip compression to reduce transfer time
- Handle partial failures - Check
rejected_rowsin the response; some rows may fail validation - Use
mode="merge"for upserts - When your source has a stable unique key, usemode="merge"withprimary_keyinstead of appending duplicates and deduping downstream - Use appropriate tables - Organize data into logical tables (e.g.,
bronze.events,bronze.users)
Related
- Push-Based Data Ingestion - Run your own extractor for sensitive deployments
- Webhooks - Trigger Definite blocks from external events

