Stream API

The Stream API lets you push JSON data directly into DuckLake tables. Send data via HTTPS, and Definite writes it to your data lake with automatic schema inference and partitioning.

How it works

Send a POST

POST JSON or NDJSON data to the Stream API endpoint with your target table.

Authenticate

Include your API key in the Authorization header.

Data lands in DuckLake

Definite writes your data to an Iceberg table with automatic schema handling.

Query immediately

Your data is available for querying right away.

Endpoint

POST https://api.definite.app/v2/stream

Authentication

Include your API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Your API key can be found in the bottom left user menu of the Definite app.

Request Body

{
  "data": [
    {"id": 1, "name": "Alice", "email": "[email protected]"},
    {"id": 2, "name": "Bob", "email": "[email protected]"}
  ],
  "config": {
    "table": "bronze.customers",
    "mode": "append",
    "wait": false,
    "tags": {
      "source": "my-app"
    }
  }
}

Fields

Field	Type	Required	Description
`data`	object or array	Yes	Single record or array of records to ingest
`config.table`	string	Yes	Target table in `schema.table` format (e.g., `bronze.events`)
`config.mode`	string	No	Ingestion mode. Only `append` is supported. Default: `append`
`config.wait`	boolean	No	Wait for commit and return snapshot ID. Default: `false`
`config.tags`	object	No	Optional metadata tags for tracing

Response

{
  "success": true,
  "request_id": "req_abc123def456",
  "stream_id": "st_xyz789ghi012",
  "table": "bronze.customers",
  "accepted": 2,
  "successful_rows": 2,
  "rejected_rows": 0,
  "partitions": ["2024-01-15"],
  "snapshot_id": null,
  "warnings": [],
  "errors": []
}

Response Fields

Field	Description
`success`	Whether the ingestion was successful
`request_id`	Unique identifier for this request
`stream_id`	Unique identifier for this stream
`table`	Fully qualified table name
`accepted`	Number of rows parsed and accepted
`successful_rows`	Number of rows successfully written
`rejected_rows`	Number of rows rejected due to validation
`partitions`	Human-friendly partition summary
`snapshot_id`	Iceberg snapshot ID (present when `wait=true`)
`warnings`	Warning messages
`errors`	Error messages

Limits

Parameter	Limit	Description
Max payload size	10 MB	Maximum request body size
Max rows per request	50,000	Maximum number of records per request
Max field size	1 MB	Maximum size of any individual field
Max nested depth	10	Maximum JSON nesting depth

Content Types

The Stream API accepts:

JSON (application/json) - Single object or array of objects
NDJSON (application/x-ndjson) - Newline-delimited JSON

Compression

You can compress your payload to reduce transfer time:

Content-Encoding: gzip

Content-Encoding: zstd

Examples

Python

import httpx

API_KEY = "YOUR_API_KEY"
API_URL = "https://api.definite.app/v2/stream"

def push_to_definite(table: str, rows: list[dict]) -> dict:
    """Push data to Definite Stream API"""

    payload = {
        "data": rows,
        "config": {
            "table": table,
            "mode": "append",
        }
    }

    response = httpx.post(
        API_URL,
        json=payload,
        headers={"Authorization": f"Bearer {API_KEY}"},
        timeout=30.0
    )
    response.raise_for_status()
    return response.json()

# Example usage
result = push_to_definite(
    table="bronze.events",
    rows=[
        {"event_type": "page_view", "user_id": "123", "timestamp": "2024-01-15T10:30:00Z"},
        {"event_type": "click", "user_id": "123", "timestamp": "2024-01-15T10:31:00Z"},
    ]
)

print(f"Ingested {result['successful_rows']} rows")

Python with compression

import gzip
import json
import httpx

def push_compressed(table: str, rows: list[dict]) -> dict:
    """Push gzip-compressed data to Definite"""

    payload = json.dumps({
        "data": rows,
        "config": {"table": table, "mode": "append"}
    }).encode("utf-8")

    compressed = gzip.compress(payload)

    response = httpx.post(
        "https://api.definite.app/v2/stream",
        content=compressed,
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
            "Content-Encoding": "gzip",
        },
        timeout=30.0
    )
    response.raise_for_status()
    return response.json()

cURL

curl -X POST "https://api.definite.app/v2/stream" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      {"event_type": "signup", "user_id": "456"},
      {"event_type": "purchase", "user_id": "456", "amount": 99.99}
    ],
    "config": {
      "table": "bronze.events",
      "mode": "append"
    }
  }'

cURL with NDJSON

echo '{"event_type": "page_view", "user_id": "123"}
{"event_type": "click", "user_id": "123"}
{"event_type": "purchase", "user_id": "123"}' | \
curl -X POST "https://api.definite.app/v2/stream" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/x-ndjson" \
  -d @-

Error Handling

HTTP Status Codes

Status	Meaning
`200`	Success - data ingested
`400`	Bad request - invalid JSON or schema
`401`	Unauthorized - invalid or missing API key
`413`	Payload too large - exceeds 10MB limit
`429`	Rate limited - too many requests
`500`	Server error - retry with backoff

Retry Strategy

For transient errors (429, 5xx), implement exponential backoff:

import httpx
from httpx_retry import RetryTransport

transport = RetryTransport(
    retries=3,
    backoff_factor=0.5,
    status_forcelist=[429, 502, 503, 504],
)

client = httpx.Client(transport=transport, timeout=30.0)

Best Practices

Batch your data - Send multiple records per request (up to 50,000) rather than one at a time
Use compression - For large payloads, enable gzip compression to reduce transfer time
Handle partial failures - Check rejected_rows in the response; some rows may fail validation
Include idempotency keys - Add a unique ID field to your records for deduplication
Use appropriate tables - Organize data into logical tables (e.g., bronze.events, bronze.users)

Push-Based Data Ingestion - Run your own extractor for sensitive deployments
Webhooks - Trigger Definite blocks from external events

Home

Getting Started

Analyze & Build

Pushing to Destinations

Extracting from Data Sources

Data Modeling

Definite API

Custom Python Functions

Workspace

How it works

Endpoint

Authentication

Request Body

Fields

Response

Response Fields

Limits

Content Types

Compression

Examples

Python

Python with compression

cURL

cURL with NDJSON

Error Handling

HTTP Status Codes

Retry Strategy

Best Practices

Home

Getting Started

Analyze & Build

Pushing to Destinations

Extracting from Data Sources

Data Modeling

Definite API

Custom Python Functions

Workspace

​How it works

​Endpoint

​Authentication

​Request Body

​Fields

​Response

​Response Fields

​Limits

​Content Types

​Compression

​Examples

​Python

​Python with compression

​cURL

​cURL with NDJSON

​Error Handling

​HTTP Status Codes

​Retry Strategy

​Best Practices

​Related

How it works

Endpoint

Authentication

Request Body

Fields

Response

Response Fields

Limits

Content Types

Compression

Examples

Python

Python with compression

cURL

cURL with NDJSON

Error Handling

HTTP Status Codes

Retry Strategy

Best Practices

Related