Skip to main content
A deep reference for AI agents working with the Definite MCP server. Covers querying, dashboards, Python automation pipelines, the Definite SDK, and Slack/email messaging. The definite-app/claude-skills marketplace packages this reference as a Claude Code skill. It loads automatically on data, dashboard, and pipeline requests, so you never have to tell the agent to “use the Definite connector.”
/plugin marketplace add definite-app/claude-skills
/plugin install definite@definite
Team and Enterprise admins can distribute it organization-wide by connecting the repo in Claude Cowork via GitHub sync.

Download as CLAUDE.md

For Cursor or other agents, add the MCP server and pull this page in as context:
# Add Definite MCP server
claude mcp add definite --transport http https://api.definite.app/v3/mcp/http --header "Authorization: YOUR_API_KEY"

# Download this page as your local CLAUDE.md
curl -o CLAUDE.md https://docs.definite.app/mcp/agent-reference.md
Setup first: See MCP Server for installation and API key setup.

MCP Tool Reference

The full set of tools the Definite MCP server exposes. The sections below walk through the common workflows; this table is the complete index.

Querying & semantic models

ToolDescription
run_sql_queryRun a DuckDB SQL query against the DuckLake catalog.
run_cube_queryRun a Cube query against semantic models.
search_cube_schemaDiscover cubes, dimensions, and measures in the semantic layer.

Integrations & credentials

ToolDescription
collect_secretsSecurely collect integration credentials via a form. Returns a session ID to poll.
oauth_connectStart an OAuth flow for integrations that require it (HubSpot, Salesforce, Sheets, etc.).
check_credential_sessionPoll a credential or OAuth session for completion.
create_integrationCreate an integration from a completed credential session.
list_integrationsList the team’s integrations (sources and destinations).
get_integrationGet one integration’s type, validation status, and config.
get_database_schemaList schemas and tables available on a database integration.

Syncs

ToolDescription
configure_syncConfigure a scheduled sync from a source integration into DuckLake.
list_sync_runsList recent sync runs and their status.
get_sync_logsFetch logs for a specific sync run.

Docs (dashboards, reports, pipelines)

ToolDescription
get_doc_schemaGet the doc.yaml schema spec. Call before creating or updating docs.
list_docsList the team’s docs.
get_docGet a doc’s full YAML and parsed structure.
create_docCreate a dashboard, report, document, or pipeline from doc.yaml.
update_docUpdate an existing doc. Supports optimistic locking via expected_version.
update_doc_datasetUpdate a single dataset in a doc without resending the whole YAML.
delete_docArchive (soft-delete) a doc.
execute_docRun a doc’s datasets. Sync by default, or async for pipelines.
get_doc_execution_statusCheck the status of an async execute_doc run.
get_doc_execution_logsFetch logs for an async execute_doc run.
capture_doc_imageCapture a doc or dashboard as a JPEG image.

Semantic (Cube) models

ToolDescription
list_cube_modelsList semantic-model YAML files for an integration.
get_cube_modelGet one cube model YAML file and its etag.
save_cube_modelCreate or update a cube model YAML file (validated before save).
delete_cube_modelDelete a cube model YAML file.

Definite Drive

ToolDescription
list_drive_filesList files in the team’s Definite Drive.
read_drive_fileRead a UTF-8 text file from Drive.
write_drive_fileWrite a UTF-8 text file to Drive (best for data-app source files).
create_drive_folderCreate a folder marker in Drive.
delete_drive_fileDelete a file from Drive.
get_drive_download_urlGet a signed download URL (valid 1 hour).
get_drive_upload_urlGet a signed upload URL (valid 48 hours).
list_drive_file_versionsList retained GCS generations (edit history) of a Drive file.
restore_drive_fileRestore a Drive file to a prior generation.

Documentation

ToolDescription
search_docsSearch the Definite product documentation.
The server also exposes fi_panel and set_thread_metadata, which only function inside the Fi chat and are not used by external agents.

Querying Data

SQL (DuckDB)

run_sql_query(sql="SELECT table_schema, table_name FROM information_schema.tables WHERE table_catalog = 'LAKE' LIMIT 100")
DuckDB rules:
  • Always include LIMIT (default to 100-2000)
  • Use information_schema for discovery (not SHOW TABLES)
  • DuckLake catalog is uppercase: WHERE table_catalog = 'LAKE'
  • Table names use three-part format: LAKE.SCHEMA.table
  • JSON: -> returns JSON, ->> returns VARCHAR. Cast for math: (data ->> 'price')::DOUBLE

Cube (Semantic Models)

run_cube_query(cube_query={
  "measures": ["sales.revenue"],
  "dimensions": ["sales.region"],
  "timeDimensions": [{"dimension": "sales.order_date", "dateRange": "last 90 days", "granularity": "month"}],
  "order": [["sales.revenue", "desc"]],
  "limit": 1000
})
Use search_cube_schema to discover available cubes, dimensions, and measures.

When to Use Which

SQLCube
Row-level detailAggregated KPIs
Ad-hoc joins, schema discoveryTime series with rollups
No Cube model existsPre-defined business logic exists

Creating Dashboards & Reports

Use create_doc with YAML content. Call get_doc_schema for the full specification. The entire doc.yaml is passed as a single string in the raw_yaml parameter (not as separate kind / metadata / layout / datasets arguments). To edit an existing doc, use update_doc (or update_doc_dataset to change one dataset without resending the whole YAML).

Minimal Dashboard

version: 1
schemaVersion: "2025-01"
kind: dashboard

metadata:
  name: "Sales Dashboard"
  description: "Revenue by region"

layout:
  columns: 36
  tiles:
    - id: revenue_chart
      x: 0
      y: 0
      w: 18
      h: 10
      type: chart
      title: "Revenue by Region"
      source: revenue_data
    - id: total_kpi
      x: 18
      y: 0
      w: 9
      h: 6
      type: kpi
      source: total_revenue

datasets:
  revenue_data:
    engine: cube
    cube: sales
    measures:
      - sales.revenue
    dimensions:
      - sales.region
    viz:
      type: bar
  total_revenue:
    engine: sql
    sql: |
      SELECT SUM(amount) as total FROM LAKE.PUBLIC.sales
    viz:
      type: kpi

Document Kinds

KindPurpose
dashboardCharts, tables, KPIs
reportReport-focused layout
documentRich text with embedded data
pipelineCode execution + scheduling (layout optional)

Grid Layout (36-column system)

WidthColumnsUse
Fullw: 36Headers, wide charts
Halfw: 18Side-by-side charts
Thirdw: 12Three-column layout
Quarterw: 9KPI row (4 across)

Dataset Engines

EngineUse When
CubeSemantic model exists; aggregations, time series, business KPIs
SQLCustom queries, joins, schema discovery, no Cube model
PythonTransformations, API calls, combining datasets, ML

Viz Types

line, bar, area, pie, scatter, heatmap, gauge, kpi, map, funnel, sankey, table, code

Data Apps

For the full guide including component reference, best practices, and DuckDB WASM details, see Data Apps. Scaffold a new app with npx create-definite-app; templates and examples ship inside @definite-app/data-apps.
Data apps are source-authored React applications compiled to a single HTML file. They use an app.json manifest to declare data resources, which the platform fetches server-side and loads into browser DuckDB WASM for client-side querying.

When to Use Data Apps vs YAML Dashboards

YAML DashboardData App
Standard charts, tables, KPIsCustom interactive UI (expandable rows, conditional formatting)
Multiple tiles in a grid layoutSingle full-screen experience
Cube/SQL datasets with built-in vizCustom visualizations (ECharts, Perspective.js)

Architecture

app.json (manifest)  -->  Platform (server-side fetch)  -->  DuckDB WASM (browser)  -->  App.tsx (client-side SQL)
Each app has this structure:
apps-v2/my-app/
  app.json            # Manifest: declares data resources
  src/main.tsx        # Entry point (boilerplate)
  src/App.tsx         # UI code using runtime hooks
  dist/index.html     # Built artifact

Creating a Data App

Step 1: Create the manifest (app.json) declaring data resources with SQL queries and camelCase column aliases:
{
  "version": 2,
  "name": "Sales Dashboard",
  "entry": "src/main.tsx",
  "resources": {
    "sales": {
      "kind": "dataset",
      "source": {
        "type": "sql",
        "sql": "SELECT region, STRFTIME(order_date, '%Y-%m-%d') AS orderDate, amount::DOUBLE AS amount FROM LAKE.PUBLIC.orders LIMIT 200000"
      },
      "public": false
    }
  }
}
Step 2: Write src/App.tsx using runtime hooks (useDataset, useSqlQuery, useTheme) and the built-in component library (AppShell, KpiCard, EChart, PerspectivePanel, etc.).
import { useDataset, useSqlQuery, useTheme, AppShell, KpiCard } from "./definite-runtime";

export default function App() {
  const { theme, toggleTheme } = useTheme();
  const data = useDataset("sales");
  const kpis = useSqlQuery(data, data.tableRef
    ? `SELECT SUM(amount)::INTEGER AS revenue FROM ${data.tableRef}`
    : "", []);

  return (
    <AppShell title="Sales Dashboard" theme={theme} onToggleTheme={toggleTheme}>
      <KpiCard title="Revenue" value={kpis.data?.[0]?.revenue} format="currency" />
    </AppShell>
  );
}
Step 3: Build to produce dist/index.html, upload to Drive, and create a Doc:
version: 1
schemaVersion: "2025-01"
kind: dashboard
metadata:
  name: "Sales Dashboard"
datasets: {}
layout:
  columns: 36
  tiles:
    - id: app
      x: 0
      y: 0
      w: 36
      h: 22
      type: html
      fullScreen: true
      driveFile: "apps-v2/my-app/dist/index.html"

Key constraints

  • Always use type: "sql" resources (not type: "cube") for column name control
  • Column names in useSqlQuery must exactly match app.json SQL aliases
  • Cast SUM results to ::INTEGER in client-side SQL
  • Pre-compute complex CASE WHEN expressions in app.json SQL (DuckDB WASM has known issues with compound boolean expressions)

Python Automation Pipelines

The full workflow: write a Python script, upload to Drive, create a pipeline doc, schedule it.

Step 1: Write Your Script

Use uv inline script metadata for dependencies:
#!/usr/bin/env -S uv run
# /// script
# dependencies = ["definite-sdk", "duckdb"]
# ///
from definite_sdk import DefiniteClient
import os

client = DefiniteClient(os.environ["DEFINITE_API_KEY"], api_url=os.environ["DEFINITE_API_BASE_URL"])
sql = client.get_sql_client()

result = sql.execute("SELECT * FROM LAKE.PUBLIC.users WHERE created_at >= CURRENT_DATE - INTERVAL '1 day' LIMIT 2000")
rows = result.get("data", [])
print(f"Found {len(rows)} new users")
Both DEFINITE_API_KEY and DEFINITE_API_BASE_URL are auto-injected when the script runs on Definite.

Step 2: Upload to Drive

Use the get_drive_upload_url MCP tool, then upload with curl:
# 1. Call MCP tool: get_drive_upload_url(file_name="my_sync.py", folder="pipelines")
# 2. Upload using the returned signed URL:
curl -X PUT -T ./my_sync.py "SIGNED_UPLOAD_URL"
The file is now accessible at /home/user/drive/pipelines/my_sync.py when running on Definite.

Step 3: Create Pipeline Doc

Use create_doc with pipeline YAML:
version: 1
schemaVersion: "2025-01"
kind: pipeline

metadata:
  name: "Daily User Sync"

datasets:
  prepare_staging:
    engine: sql
    sql: |
      CREATE OR REPLACE TABLE LAKE.STAGING.daily_snapshot AS
      SELECT * FROM LAKE.PUBLIC.users
      WHERE created_at >= CURRENT_DATE - INTERVAL '1 day'

  run_sync:
    engine: python
    command: "uv run /home/user/drive/pipelines/my_sync.py"
    dependsOn: [prepare_staging]
    fiEnvId: "<_THREAD_ID>"
    timeoutMs: 300000

automations:
  - trigger:
      type: schedule
      config:
        type: cron
        cron: "0 9 * * *"

Key Pipeline Fields

FieldDescription
commandShell command to run (e.g., uv run /home/user/drive/script.py)
codeInline Python (alternative to command for short scripts)
dependsOnArray of dataset IDs that must complete first
fiEnvIdAlways set to "<_THREAD_ID>" for scheduled jobs. Ensures persistent sandbox with installed packages and drive access.
timeoutMsTimeout in ms. Default is 6000 (6s), which is too short. Use 300000 (5min) to 1800000 (30min).
inputsArray of dataset IDs available as DataFrame variables in Python

Automations

automations:
  - trigger:
      type: schedule
      config:
        type: cron
        cron: "0 9 * * *"     # Daily at 9am UTC
    destination:               # Optional: send snapshot
      type: slack
      channel_id: "C1234..."
Cron examples: "0 */4 * * *" (every 4h), "0 9 * * 1" (Mondays 9am), "0 0 1 * *" (1st of month)

Definite SDK

Install with uv add definite-sdk or use inline script dependencies as shown above.
from definite_sdk import DefiniteClient
import os

# In pipelines, env vars are auto-injected
# For local use, set DEFINITE_API_KEY env var or pass directly
client = DefiniteClient(
    os.environ["DEFINITE_API_KEY"],
    api_url=os.environ.get("DEFINITE_API_BASE_URL", "https://api.definite.app")
)
SDK source: github.com/luabase/definite_sdk

SQL & Cube Queries

sql_client = client.get_sql_client()
result = sql_client.execute("SELECT * FROM LAKE.PUBLIC.users LIMIT 10")

cube_query = {
    "measures": ["sales.revenue"],
    "timeDimensions": [{"dimension": "sales.date", "granularity": "month"}],
    "limit": 1000
}
result = sql_client.execute_cube_query(cube_query, integration_id="YOUR_CUBE_INTEGRATION_ID")

KV Store (Pipeline State)

state = client.get_kv_store('my_pipeline')  # Unique name per pipeline
last_sync = state.get('last_sync_at')
state['last_sync_at'] = '2026-01-31T00:00:00'
state.commit()  # Persist changes

Secret Store

secrets = client.secret_store()
secrets.set_secret("external_api_key", "sk-...")
key = secrets.get_secret("external_api_key")

Drive + DuckLake (server-side, preferred)

Upload local data to Drive and ingest via read_parquet — no local GCS credentials needed, bytes go client → GCS directly:
drive = client.get_drive_client()
sql = client.get_sql_client()

r = drive.write_temporary_file(parquet_bytes, name="events.parquet", ttl_days=7)
sql.execute(
    f"CREATE OR REPLACE TABLE LAKE.MY_SCHEMA.EVENTS AS "
    f"SELECT * FROM read_parquet('{r.gcs_path}')"
)

DuckLake — legacy local DuckDB attach

client.attach_ducklake() still works for teams with HMAC keys (pre-April 2026) but emits a DeprecationWarning and raises UnsupportedDuckLakeAttachError on newer teams. Prefer the Drive + SQL pattern above.
import duckdb
conn = duckdb.connect()
conn.execute(client.attach_ducklake())  # deprecated
df = conn.sql("SELECT * FROM lake.PUBLIC.users LIMIT 10").df()

Integration Store

integrations = list(client.integration_store().list_integrations(integration_type="slack"))
slack_id = integrations[0]["id"]

Slack & Email Messaging

Slack Messages (with Block Kit)

msg = client.get_message_client()

# Simple text message
msg.send_slack_message(
    integration_id="YOUR_SLACK_INTEGRATION_ID",
    channel_id="C1234567890",
    text="Hello from Definite!"
)

# Rich message with Slack Block Kit
msg.send_slack_message(
    integration_id="YOUR_SLACK_INTEGRATION_ID",
    channel_id="C1234567890",
    text="Daily report ready",  # Fallback for notifications
    blocks=[
        {"type": "header", "text": {"type": "plain_text", "text": "Daily Report"}},
        {"type": "section", "text": {"type": "mrkdwn", "text": "*New users today:* 42\n*Revenue:* $12,500"}},
        {"type": "actions", "elements": [
            {
                "type": "button",
                "text": {"type": "plain_text", "text": "View Dashboard"},
                "url": "https://ui.definite.app/docs/DOC_ID",
                "style": "primary"
            }
        ]}
    ],
    thread_ts="1234567890.123456"  # Optional: reply to thread
)
Find your Slack integration ID: Use list_integrations(integration_type="slack") MCP tool.

Email Messages

msg = client.get_message_client()
msg.send_email_message(
    to_emails=["team@example.com"],
    subject="Daily Report: 42 new users",
    body="<h2>New Users</h2><p>42 users signed up today.</p>"  # HTML supported
)

Alert Workflows

Combine pipelines + SDK messaging for data-driven alerts.

Email Alert (Daily New Users)

#!/usr/bin/env -S uv run
# /// script
# dependencies = ["definite-sdk", "duckdb"]
# ///
from definite_sdk import DefiniteClient
import os

client = DefiniteClient(os.environ["DEFINITE_API_KEY"], api_url=os.environ["DEFINITE_API_BASE_URL"])
sql = client.get_sql_client()

result = sql.execute(
    "SELECT email, created_at FROM LAKE.PUBLIC.users "
    "WHERE created_at >= CURRENT_DATE - INTERVAL '1 day' LIMIT 2000"
)
rows = result.get("data", [])

if rows:
    table_rows = "".join(
        f"<tr><td>{r['email']}</td><td>{r['created_at']}</td></tr>" for r in rows
    )
    body = (
        f"<h2>New Users: {len(rows)}</h2>"
        f"<table><tr><th>Email</th><th>Signed Up</th></tr>{table_rows}</table>"
    )
    client.get_message_client().send_email_message(
        to_emails=["team@example.com"],
        subject=f"Daily: {len(rows)} new users",
        body=body,
    )

Threshold Alert (Slack with KV State)

#!/usr/bin/env -S uv run
# /// script
# dependencies = ["definite-sdk", "duckdb"]
# ///
from definite_sdk import DefiniteClient
import os

client = DefiniteClient(os.environ["DEFINITE_API_KEY"], api_url=os.environ["DEFINITE_API_BASE_URL"])
sql = client.get_sql_client()
state = client.get_kv_store('mrr_alert')

result = sql.execute(
    "SELECT SUM(amount) as mrr FROM LAKE.PUBLIC.subscriptions WHERE status = 'active' LIMIT 1"
)
current_mrr = float(result["data"][0]["mrr"])
previous_mrr = float(state.get('last_mrr', '0'))

if previous_mrr > 0:
    pct_change = (current_mrr - previous_mrr) / previous_mrr
    if abs(pct_change) >= 0.20:
        direction = "up" if pct_change > 0 else "down"
        client.get_message_client().send_slack_message(
            integration_id="YOUR_SLACK_INTEGRATION_ID",
            channel_id="C1234567890",
            text=f"MRR Alert: {direction} {abs(pct_change):.0%}",
            blocks=[
                {"type": "header", "text": {"type": "plain_text", "text": "MRR Alert"}},
                {
                    "type": "section",
                    "text": {
                        "type": "mrkdwn",
                        "text": f"MRR moved *{direction} {abs(pct_change):.0%}*\nFrom ${previous_mrr:,.0f} to ${current_mrr:,.0f}"
                    }
                },
            ],
        )

state['last_mrr'] = str(current_mrr)
state.commit()

Wrap in Pipeline Doc

Upload either script to Drive, then create a pipeline doc:
version: 1
schemaVersion: "2025-01"
kind: pipeline

metadata:
  name: "MRR Alert"

datasets:
  check_mrr:
    engine: python
    command: "uv run /home/user/drive/alerts/mrr_alert.py"
    fiEnvId: "<_THREAD_ID>"
    timeoutMs: 600000

automations:
  - trigger:
      type: schedule
      config:
        type: cron
        cron: "0 9 * * *"

HTML Output from Python Datasets

To render custom HTML (Plotly, D3, etc.) in a dashboard tile:
import os
with open(os.environ["DEFINITE_RESULT_TYPE_FILE"], "w") as f:
    f.write("html")
with open(os.environ["DEFINITE_RESULT_FILE"], "w") as f:
    f.write(html_content)
Pair with type: html tiles in your dashboard layout.

CLAUDE.md for Your Project

For the best experience with Claude Code, copy the Definite CLAUDE.md template into your project root alongside your .mcp.json. This gives Claude full context on all Definite MCP tools and patterns.