> ## Documentation Index
> Fetch the complete documentation index at: https://docs.definite.app/llms.txt
> Use this file to discover all available pages before exploring further.

# Agent Reference

> Complete reference for using Definite from Claude Code, Cursor, or any AI agent

A deep reference for AI agents working with the Definite MCP server. Covers querying, dashboards, Python automation pipelines, the Definite SDK, and Slack/email messaging.

### Install the Definite skill (recommended)

The [`definite-app/claude-skills`](https://github.com/definite-app/claude-skills)
marketplace packages this reference as a Claude Code skill. It loads automatically
on data, dashboard, and pipeline requests, so you never have to tell the agent to
"use the Definite connector."

```bash theme={null}
/plugin marketplace add definite-app/claude-skills
/plugin install definite@definite
```

Team and Enterprise admins can distribute it organization-wide by connecting the
repo in Claude Cowork via GitHub sync.

### Download as CLAUDE.md

For Cursor or other agents, add the MCP server and pull this page in as context:

```bash theme={null}
# Add Definite MCP server
claude mcp add definite --transport http https://api.definite.app/v3/mcp/http --header "Authorization: YOUR_API_KEY"

# Download this page as your local CLAUDE.md
curl -o CLAUDE.md https://docs.definite.app/mcp/agent-reference.md
```

<Note>
  **Setup first:** See [MCP Server](/mcp/mcp) for installation and API key setup.
</Note>

***

## MCP Tool Reference

The full set of tools the Definite MCP server exposes. The sections below walk through the
common workflows; this table is the complete index.

### Querying & semantic models

| Tool                 | Description                                                     |
| -------------------- | --------------------------------------------------------------- |
| `run_sql_query`      | Run a DuckDB SQL query against the DuckLake catalog.            |
| `run_cube_query`     | Run a Cube query against semantic models.                       |
| `search_cube_schema` | Discover cubes, dimensions, and measures in the semantic layer. |

### Integrations & credentials

| Tool                       | Description                                                                               |
| -------------------------- | ----------------------------------------------------------------------------------------- |
| `collect_secrets`          | Securely collect integration credentials via a form. Returns a session ID to poll.        |
| `oauth_connect`            | Start an OAuth flow for integrations that require it (HubSpot, Salesforce, Sheets, etc.). |
| `check_credential_session` | Poll a credential or OAuth session for completion.                                        |
| `create_integration`       | Create an integration from a completed credential session.                                |
| `list_integrations`        | List the team's integrations (sources and destinations).                                  |
| `get_integration`          | Get one integration's type, validation status, and config.                                |
| `get_database_schema`      | List schemas and tables available on a database integration.                              |

### Syncs

| Tool             | Description                                                         |
| ---------------- | ------------------------------------------------------------------- |
| `configure_sync` | Configure a scheduled sync from a source integration into DuckLake. |
| `list_sync_runs` | List recent sync runs and their status.                             |
| `get_sync_logs`  | Fetch logs for a specific sync run.                                 |

### Docs (dashboards, reports, pipelines)

| Tool                       | Description                                                                 |
| -------------------------- | --------------------------------------------------------------------------- |
| `get_doc_schema`           | Get the `doc.yaml` schema spec. Call before creating or updating docs.      |
| `list_docs`                | List the team's docs.                                                       |
| `get_doc`                  | Get a doc's full YAML and parsed structure.                                 |
| `create_doc`               | Create a dashboard, report, document, or pipeline from `doc.yaml`.          |
| `update_doc`               | Update an existing doc. Supports optimistic locking via `expected_version`. |
| `update_doc_dataset`       | Update a single dataset in a doc without resending the whole YAML.          |
| `delete_doc`               | Archive (soft-delete) a doc.                                                |
| `execute_doc`              | Run a doc's datasets. Sync by default, or async for pipelines.              |
| `get_doc_execution_status` | Check the status of an async `execute_doc` run.                             |
| `get_doc_execution_logs`   | Fetch logs for an async `execute_doc` run.                                  |
| `capture_doc_image`        | Capture a doc or dashboard as a JPEG image.                                 |

### Semantic (Cube) models

| Tool                | Description                                                      |
| ------------------- | ---------------------------------------------------------------- |
| `list_cube_models`  | List semantic-model YAML files for an integration.               |
| `get_cube_model`    | Get one cube model YAML file and its etag.                       |
| `save_cube_model`   | Create or update a cube model YAML file (validated before save). |
| `delete_cube_model` | Delete a cube model YAML file.                                   |

### Definite Drive

| Tool                       | Description                                                        |
| -------------------------- | ------------------------------------------------------------------ |
| `list_drive_files`         | List files in the team's Definite Drive.                           |
| `read_drive_file`          | Read a UTF-8 text file from Drive.                                 |
| `write_drive_file`         | Write a UTF-8 text file to Drive (best for data-app source files). |
| `create_drive_folder`      | Create a folder marker in Drive.                                   |
| `delete_drive_file`        | Delete a file from Drive.                                          |
| `get_drive_download_url`   | Get a signed download URL (valid 1 hour).                          |
| `get_drive_upload_url`     | Get a signed upload URL (valid 48 hours).                          |
| `list_drive_file_versions` | List retained GCS generations (edit history) of a Drive file.      |
| `restore_drive_file`       | Restore a Drive file to a prior generation.                        |

### Documentation

| Tool          | Description                                |
| ------------- | ------------------------------------------ |
| `search_docs` | Search the Definite product documentation. |

<Note>
  The server also exposes `fi_panel` and `set_thread_metadata`, which only function inside
  the Fi chat and are not used by external agents.
</Note>

***

## Querying Data

### SQL (DuckDB)

```
run_sql_query(sql="SELECT table_schema, table_name FROM information_schema.tables WHERE table_catalog = 'LAKE' LIMIT 100")
```

**DuckDB rules:**

* Always include `LIMIT` (default to 100-2000)
* Use `information_schema` for discovery (not `SHOW TABLES`)
* DuckLake catalog is uppercase: `WHERE table_catalog = 'LAKE'`
* Table names use three-part format: `LAKE.SCHEMA.table`
* JSON: `->` returns JSON, `->>` returns VARCHAR. Cast for math: `(data ->> 'price')::DOUBLE`

### Cube (Semantic Models)

```
run_cube_query(cube_query={
  "measures": ["sales.revenue"],
  "dimensions": ["sales.region"],
  "timeDimensions": [{"dimension": "sales.order_date", "dateRange": "last 90 days", "granularity": "month"}],
  "order": [["sales.revenue", "desc"]],
  "limit": 1000
})
```

Use `search_cube_schema` to discover available cubes, dimensions, and measures.

### When to Use Which

| SQL                            | Cube                              |
| ------------------------------ | --------------------------------- |
| Row-level detail               | Aggregated KPIs                   |
| Ad-hoc joins, schema discovery | Time series with rollups          |
| No Cube model exists           | Pre-defined business logic exists |

***

## Creating Dashboards & Reports

Use `create_doc` with YAML content. Call `get_doc_schema` for the full specification.

The entire `doc.yaml` is passed as a single string in the `raw_yaml` parameter (not as
separate `kind` / `metadata` / `layout` / `datasets` arguments). To edit an existing doc,
use `update_doc` (or `update_doc_dataset` to change one dataset without resending the whole YAML).

### Minimal Dashboard

```yaml theme={null}
version: 1
schemaVersion: "2025-01"
kind: dashboard

metadata:
  name: "Sales Dashboard"
  description: "Revenue by region"

layout:
  columns: 36
  tiles:
    - id: revenue_chart
      x: 0
      y: 0
      w: 18
      h: 10
      type: chart
      title: "Revenue by Region"
      source: revenue_data
    - id: total_kpi
      x: 18
      y: 0
      w: 9
      h: 6
      type: kpi
      source: total_revenue

datasets:
  revenue_data:
    engine: cube
    cube: sales
    measures:
      - sales.revenue
    dimensions:
      - sales.region
    viz:
      type: bar
  total_revenue:
    engine: sql
    sql: |
      SELECT SUM(amount) as total FROM LAKE.PUBLIC.sales
    viz:
      type: kpi
```

### Document Kinds

| Kind        | Purpose                                       |
| ----------- | --------------------------------------------- |
| `dashboard` | Charts, tables, KPIs                          |
| `report`    | Report-focused layout                         |
| `document`  | Rich text with embedded data                  |
| `pipeline`  | Code execution + scheduling (layout optional) |

### Grid Layout (36-column system)

| Width   | Columns | Use                  |
| ------- | ------- | -------------------- |
| Full    | `w: 36` | Headers, wide charts |
| Half    | `w: 18` | Side-by-side charts  |
| Third   | `w: 12` | Three-column layout  |
| Quarter | `w: 9`  | KPI row (4 across)   |

### Dataset Engines

| Engine     | Use When                                                        |
| ---------- | --------------------------------------------------------------- |
| **Cube**   | Semantic model exists; aggregations, time series, business KPIs |
| **SQL**    | Custom queries, joins, schema discovery, no Cube model          |
| **Python** | Transformations, API calls, combining datasets, ML              |

### Viz Types

`line`, `bar`, `area`, `pie`, `scatter`, `heatmap`, `gauge`, `kpi`, `map`, `funnel`, `sankey`, `table`, `code`

***

## Data Apps

<Note>
  For the full guide including component reference, best practices, and DuckDB WASM details, see [Data Apps](/analyze-build/docs/data-apps). Scaffold a new app with [`npx create-definite-app`](https://www.npmjs.com/package/create-definite-app); templates and examples ship inside [`@definite-app/data-apps`](https://www.npmjs.com/package/@definite-app/data-apps).
</Note>

Data apps are source-authored React applications compiled to a single HTML file. They use an `app.json` manifest to declare data resources, which the platform fetches server-side and loads into browser DuckDB WASM for client-side querying.

### When to Use Data Apps vs YAML Dashboards

| YAML Dashboard                      | Data App                                                        |
| ----------------------------------- | --------------------------------------------------------------- |
| Standard charts, tables, KPIs       | Custom interactive UI (expandable rows, conditional formatting) |
| Multiple tiles in a grid layout     | Single full-screen experience                                   |
| Cube/SQL datasets with built-in viz | Custom visualizations (ECharts, Perspective.js)                 |

### Architecture

```
app.json (manifest)  -->  Platform (server-side fetch)  -->  DuckDB WASM (browser)  -->  App.tsx (client-side SQL)
```

Each app has this structure:

```
apps-v2/my-app/
  app.json            # Manifest: declares data resources
  src/main.tsx        # Entry point (boilerplate)
  src/App.tsx         # UI code using runtime hooks
  dist/index.html     # Built artifact
```

### Creating a Data App

**Step 1:** Create the manifest (`app.json`) declaring data resources with SQL queries and camelCase column aliases:

```json theme={null}
{
  "version": 2,
  "name": "Sales Dashboard",
  "entry": "src/main.tsx",
  "resources": {
    "sales": {
      "kind": "dataset",
      "source": {
        "type": "sql",
        "sql": "SELECT region, STRFTIME(order_date, '%Y-%m-%d') AS orderDate, amount::DOUBLE AS amount FROM LAKE.PUBLIC.orders LIMIT 200000"
      },
      "public": false
    }
  }
}
```

**Step 2:** Write `src/App.tsx` using runtime hooks (`useDataset`, `useSqlQuery`, `useTheme`) and the built-in component library (`AppShell`, `KpiCard`, `EChart`, `PerspectivePanel`, etc.).

```tsx theme={null}
import { useDataset, useSqlQuery, useTheme, AppShell, KpiCard } from "./definite-runtime";

export default function App() {
  const { theme, toggleTheme } = useTheme();
  const data = useDataset("sales");
  const kpis = useSqlQuery(data, data.tableRef
    ? `SELECT SUM(amount)::INTEGER AS revenue FROM ${data.tableRef}`
    : "", []);

  return (
    <AppShell title="Sales Dashboard" theme={theme} onToggleTheme={toggleTheme}>
      <KpiCard title="Revenue" value={kpis.data?.[0]?.revenue} format="currency" />
    </AppShell>
  );
}
```

**Step 3:** Build to produce `dist/index.html`, upload to Drive, and create a Doc:

```yaml theme={null}
version: 1
schemaVersion: "2025-01"
kind: dashboard
metadata:
  name: "Sales Dashboard"
datasets: {}
layout:
  columns: 36
  tiles:
    - id: app
      x: 0
      y: 0
      w: 36
      h: 22
      type: html
      fullScreen: true
      driveFile: "apps-v2/my-app/dist/index.html"
```

### Key constraints

* Always use `type: "sql"` resources (not `type: "cube"`) for column name control
* Column names in `useSqlQuery` must exactly match `app.json` SQL aliases
* Cast `SUM` results to `::INTEGER` in client-side SQL
* Pre-compute complex `CASE WHEN` expressions in `app.json` SQL (DuckDB WASM has known issues with compound boolean expressions)

***

## Python Automation Pipelines

The full workflow: write a Python script, upload to Drive, create a pipeline doc, schedule it.

### Step 1: Write Your Script

Use uv inline script metadata for dependencies:

```python theme={null}
#!/usr/bin/env -S uv run
# /// script
# dependencies = ["definite-sdk", "duckdb"]
# ///
from definite_sdk import DefiniteClient
import os

client = DefiniteClient(os.environ["DEFINITE_API_KEY"], api_url=os.environ["DEFINITE_API_BASE_URL"])
sql = client.get_sql_client()

result = sql.execute("SELECT * FROM LAKE.PUBLIC.users WHERE created_at >= CURRENT_DATE - INTERVAL '1 day' LIMIT 2000")
rows = result.get("data", [])
print(f"Found {len(rows)} new users")
```

<Note>
  Both `DEFINITE_API_KEY` and `DEFINITE_API_BASE_URL` are auto-injected when the script runs on Definite.
</Note>

### Step 2: Upload to Drive

Use the `get_drive_upload_url` MCP tool, then upload with curl:

```bash theme={null}
# 1. Call MCP tool: get_drive_upload_url(file_name="my_sync.py", folder="pipelines")
# 2. Upload using the returned signed URL:
curl -X PUT -T ./my_sync.py "SIGNED_UPLOAD_URL"
```

The file is now accessible at `/home/user/drive/pipelines/my_sync.py` when running on Definite.

### Step 3: Create Pipeline Doc

Use `create_doc` with pipeline YAML:

```yaml theme={null}
version: 1
schemaVersion: "2025-01"
kind: pipeline

metadata:
  name: "Daily User Sync"

datasets:
  prepare_staging:
    engine: sql
    sql: |
      CREATE OR REPLACE TABLE LAKE.STAGING.daily_snapshot AS
      SELECT * FROM LAKE.PUBLIC.users
      WHERE created_at >= CURRENT_DATE - INTERVAL '1 day'

  run_sync:
    engine: python
    command: "uv run /home/user/drive/pipelines/my_sync.py"
    dependsOn: [prepare_staging]
    fiEnvId: "<_THREAD_ID>"
    timeoutMs: 300000

automations:
  - trigger:
      type: schedule
      config:
        type: cron
        cron: "0 9 * * *"
```

### Key Pipeline Fields

| Field       | Description                                                                                                                 |
| ----------- | --------------------------------------------------------------------------------------------------------------------------- |
| `command`   | Shell command to run (e.g., `uv run /home/user/drive/script.py`)                                                            |
| `code`      | Inline Python (alternative to `command` for short scripts)                                                                  |
| `dependsOn` | Array of dataset IDs that must complete first                                                                               |
| `fiEnvId`   | **Always set to `"<_THREAD_ID>"`** for scheduled jobs. Ensures persistent sandbox with installed packages and drive access. |
| `timeoutMs` | Timeout in ms. Default is 6000 (6s), which is too short. Use 300000 (5min) to 1800000 (30min).                              |
| `inputs`    | Array of dataset IDs available as DataFrame variables in Python                                                             |

### Automations

```yaml theme={null}
automations:
  - trigger:
      type: schedule
      config:
        type: cron
        cron: "0 9 * * *"     # Daily at 9am UTC
    destination:               # Optional: send snapshot
      type: slack
      channel_id: "C1234..."
```

**Cron examples:** `"0 */4 * * *"` (every 4h), `"0 9 * * 1"` (Mondays 9am), `"0 0 1 * *"` (1st of month)

***

## Definite SDK

Install with `uv add definite-sdk` or use inline script dependencies as shown above.

```python theme={null}
from definite_sdk import DefiniteClient
import os

# In pipelines, env vars are auto-injected
# For local use, set DEFINITE_API_KEY env var or pass directly
client = DefiniteClient(
    os.environ["DEFINITE_API_KEY"],
    api_url=os.environ.get("DEFINITE_API_BASE_URL", "https://api.definite.app")
)
```

**SDK source:** [github.com/luabase/definite\_sdk](https://github.com/luabase/definite_sdk)

### SQL & Cube Queries

```python theme={null}
sql_client = client.get_sql_client()
result = sql_client.execute("SELECT * FROM LAKE.PUBLIC.users LIMIT 10")

cube_query = {
    "measures": ["sales.revenue"],
    "timeDimensions": [{"dimension": "sales.date", "granularity": "month"}],
    "limit": 1000
}
result = sql_client.execute_cube_query(cube_query, integration_id="YOUR_CUBE_INTEGRATION_ID")
```

### KV Store (Pipeline State)

```python theme={null}
state = client.get_kv_store('my_pipeline')  # Unique name per pipeline
last_sync = state.get('last_sync_at')
state['last_sync_at'] = '2026-01-31T00:00:00'
state.commit()  # Persist changes
```

### Secret Store

```python theme={null}
secrets = client.secret_store()
secrets.set_secret("external_api_key", "sk-...")
key = secrets.get_secret("external_api_key")
```

### Drive + DuckLake (server-side, preferred)

Upload local data to Drive and ingest via `read_parquet` — no local GCS credentials needed, bytes go client → GCS directly:

```python theme={null}
drive = client.get_drive_client()
sql = client.get_sql_client()

r = drive.write_temporary_file(parquet_bytes, name="events.parquet", ttl_days=7)
sql.execute(
    f"CREATE OR REPLACE TABLE LAKE.MY_SCHEMA.EVENTS AS "
    f"SELECT * FROM read_parquet('{r.gcs_path}')"
)
```

### DuckLake — legacy local DuckDB attach

`client.attach_ducklake()` still works for teams with HMAC keys (pre-April 2026) but emits a `DeprecationWarning` and raises `UnsupportedDuckLakeAttachError` on newer teams. Prefer the Drive + SQL pattern above.

```python theme={null}
import duckdb
conn = duckdb.connect()
conn.execute(client.attach_ducklake())  # deprecated
df = conn.sql("SELECT * FROM lake.PUBLIC.users LIMIT 10").df()
```

### Integration Store

```python theme={null}
integrations = list(client.integration_store().list_integrations(integration_type="slack"))
slack_id = integrations[0]["id"]
```

***

## Slack & Email Messaging

### Slack Messages (with Block Kit)

```python theme={null}
msg = client.get_message_client()

# Simple text message
msg.send_slack_message(
    integration_id="YOUR_SLACK_INTEGRATION_ID",
    channel_id="C1234567890",
    text="Hello from Definite!"
)

# Rich message with Slack Block Kit
msg.send_slack_message(
    integration_id="YOUR_SLACK_INTEGRATION_ID",
    channel_id="C1234567890",
    text="Daily report ready",  # Fallback for notifications
    blocks=[
        {"type": "header", "text": {"type": "plain_text", "text": "Daily Report"}},
        {"type": "section", "text": {"type": "mrkdwn", "text": "*New users today:* 42\n*Revenue:* $12,500"}},
        {"type": "actions", "elements": [
            {
                "type": "button",
                "text": {"type": "plain_text", "text": "View Dashboard"},
                "url": "https://ui.definite.app/docs/DOC_ID",
                "style": "primary"
            }
        ]}
    ],
    thread_ts="1234567890.123456"  # Optional: reply to thread
)
```

<Tip>
  **Find your Slack integration ID:** Use `list_integrations(integration_type="slack")` MCP tool.
</Tip>

### Email Messages

```python theme={null}
msg = client.get_message_client()
msg.send_email_message(
    to_emails=["team@example.com"],
    subject="Daily Report: 42 new users",
    body="<h2>New Users</h2><p>42 users signed up today.</p>"  # HTML supported
)
```

***

## Alert Workflows

Combine pipelines + SDK messaging for data-driven alerts.

### Email Alert (Daily New Users)

```python theme={null}
#!/usr/bin/env -S uv run
# /// script
# dependencies = ["definite-sdk", "duckdb"]
# ///
from definite_sdk import DefiniteClient
import os

client = DefiniteClient(os.environ["DEFINITE_API_KEY"], api_url=os.environ["DEFINITE_API_BASE_URL"])
sql = client.get_sql_client()

result = sql.execute(
    "SELECT email, created_at FROM LAKE.PUBLIC.users "
    "WHERE created_at >= CURRENT_DATE - INTERVAL '1 day' LIMIT 2000"
)
rows = result.get("data", [])

if rows:
    table_rows = "".join(
        f"<tr><td>{r['email']}</td><td>{r['created_at']}</td></tr>" for r in rows
    )
    body = (
        f"<h2>New Users: {len(rows)}</h2>"
        f"<table><tr><th>Email</th><th>Signed Up</th></tr>{table_rows}</table>"
    )
    client.get_message_client().send_email_message(
        to_emails=["team@example.com"],
        subject=f"Daily: {len(rows)} new users",
        body=body,
    )
```

### Threshold Alert (Slack with KV State)

```python theme={null}
#!/usr/bin/env -S uv run
# /// script
# dependencies = ["definite-sdk", "duckdb"]
# ///
from definite_sdk import DefiniteClient
import os

client = DefiniteClient(os.environ["DEFINITE_API_KEY"], api_url=os.environ["DEFINITE_API_BASE_URL"])
sql = client.get_sql_client()
state = client.get_kv_store('mrr_alert')

result = sql.execute(
    "SELECT SUM(amount) as mrr FROM LAKE.PUBLIC.subscriptions WHERE status = 'active' LIMIT 1"
)
current_mrr = float(result["data"][0]["mrr"])
previous_mrr = float(state.get('last_mrr', '0'))

if previous_mrr > 0:
    pct_change = (current_mrr - previous_mrr) / previous_mrr
    if abs(pct_change) >= 0.20:
        direction = "up" if pct_change > 0 else "down"
        client.get_message_client().send_slack_message(
            integration_id="YOUR_SLACK_INTEGRATION_ID",
            channel_id="C1234567890",
            text=f"MRR Alert: {direction} {abs(pct_change):.0%}",
            blocks=[
                {"type": "header", "text": {"type": "plain_text", "text": "MRR Alert"}},
                {
                    "type": "section",
                    "text": {
                        "type": "mrkdwn",
                        "text": f"MRR moved *{direction} {abs(pct_change):.0%}*\nFrom ${previous_mrr:,.0f} to ${current_mrr:,.0f}"
                    }
                },
            ],
        )

state['last_mrr'] = str(current_mrr)
state.commit()
```

### Wrap in Pipeline Doc

Upload either script to Drive, then create a pipeline doc:

```yaml theme={null}
version: 1
schemaVersion: "2025-01"
kind: pipeline

metadata:
  name: "MRR Alert"

datasets:
  check_mrr:
    engine: python
    command: "uv run /home/user/drive/alerts/mrr_alert.py"
    fiEnvId: "<_THREAD_ID>"
    timeoutMs: 600000

automations:
  - trigger:
      type: schedule
      config:
        type: cron
        cron: "0 9 * * *"
```

***

## HTML Output from Python Datasets

To render custom HTML (Plotly, D3, etc.) in a dashboard tile:

```python theme={null}
import os
with open(os.environ["DEFINITE_RESULT_TYPE_FILE"], "w") as f:
    f.write("html")
with open(os.environ["DEFINITE_RESULT_FILE"], "w") as f:
    f.write(html_content)
```

Pair with `type: html` tiles in your dashboard layout.

***

## CLAUDE.md for Your Project

For the best experience with Claude Code, copy the [Definite CLAUDE.md template](https://github.com/luabase/definite_sdk) into your project root alongside your `.mcp.json`. This gives Claude full context on all Definite MCP tools and patterns.
