Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.definite.app/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through a production-grade on-prem install of Definite on AWS. You’ll provision the infrastructure with Terraform, install the platform with the definite CLI, and tear it down cleanly when you’re done. End to end: about 30 to 40 minutes, most of which is terraform apply waiting on EKS and RDS.

What gets created

ResourceNotes
VPCPrivate subnets for nodes and the database, public subnets for NAT and load balancers
EKS clusterKubernetes 1.30 (configurable), managed node group, OIDC provider enabled for IRSA
RDS Postgres 15Private (not publicly accessible), reachable only from the cluster security group
S3 bucketLakehouse data; public access blocked, encryption and versioning on
IAM IRSA rolesLeast-privilege roles for the lakehouse and for Fi’s Bedrock access
IAM user + access keyStatic-credential fallback for the lakehouse (see IRSA vs access key)
gp3 StorageClassCluster-default StorageClass the lakehouse PVC needs
Every name, instance size, CIDR, and count is a Terraform variable with a sane default. Override only what you need to in terraform.tfvars.

Prerequisites

1

AWS account access

AWS credentials in your shell with permission to create VPC, EKS, RDS, S3, and IAM resources. The Terraform provider reads the standard AWS credential chain (env vars, shared config, SSO, instance profile). No secrets live in the Terraform code.
2

Local tooling

Install the following on the machine you’ll run terraform and definite from:
ToolVersionCheck
terraform1.5+terraform version
aws CLIrecentaws --version
kubectl1.28+kubectl version --client
helm3.12+helm version
3

LLM access

Decide which LLM provider Fi will use. Bedrock is the most common choice for AWS deployments; this guide uses it as the default. Anthropic, Vertex, and Azure OpenAI are also supported. If you go with Bedrock, make sure you’ve requested model access for Claude in the target region.
4

(Optional) Remote Terraform state

For team use, configure an S3 backend in versions.tf before applying. A backend stub is committed in the module. Single-operator installs can skip this and use local state.

Phase 1: Provision AWS infrastructure with Terraform

Clone the on-prem repo and change into the AWS Terraform module:
git clone https://github.com/definite-app/definite-onprem
cd definite-onprem/deploy/terraform/aws

1. Configure inputs

Copy the example tfvars file and edit it:
cp terraform.tfvars.example terraform.tfvars
$EDITOR terraform.tfvars
A minimal terraform.tfvars is just two lines:
region      = "<your-region>"
name_prefix = "<your-name-prefix>"
Everything else has a default. Common overrides:
# Lock the EKS public API endpoint to your office or VPN CIDRs.
cluster_endpoint_public_access_cidrs = ["<your-cidr>/24"]

# Production hardening.
rds_multi_az            = true
rds_deletion_protection = true

# Bedrock model access (must be enabled in the AWS console first).
bedrock_model_ids = ["anthropic.claude-sonnet-4-20250514-v1:0"]
See the module’s README for the full list of variables.

2. Init, plan, apply

terraform init
terraform plan
terraform apply
terraform apply takes 20 to 25 minutes (mostly EKS and RDS). When it finishes, every value you need is in terraform output.

3. Read the outputs

Inspect the non-sensitive outputs:
terraform output
For sensitive values, read them explicitly with -raw:
terraform output -raw rds_password
terraform output -raw lakehouse_s3_secret_access_key
terraform output -raw postgres_url
The values you’ll feed into config.yaml are summarized here:
Terraform outputGoes into
cluster_nameaws eks update-kubeconfig --name ...
regionobject_store.region, llm.region
postgres_url (sensitive)postgres.url
rds_password (sensitive)POSTGRES_PASSWORD env var
lakehouse_bucket_nameobject_store.bucket
lakehouse_prefixlakehouse.prefix
lakehouse_s3_access_key_idS3_ACCESS_KEY_ID env var
lakehouse_s3_secret_access_key (sensitive)S3_SECRET_ACCESS_KEY env var
bedrock_irsa_role_arnserviceAccount.annotations (Bedrock path)
The module also emits a ready-to-paste, non-secret config.yaml snippet:
terraform output -raw config_yaml_fragment

IRSA vs access key

The module emits both an IRSA role and an IAM user + access key for lakehouse S3 access. Both share one identical least-privilege policy.
OptionWhen to use
Access key (lakehouse_s3_access_key_id, lakehouse_s3_secret_access_key)Use this today. The lakehouse reads S3 through DuckDB’s httpfs extension, which speaks the S3 API with static credentials, not the AWS SDK, so it can’t assume an IRSA role yet.
IRSA role (lakehouse_irsa_role_arn)The end state. Once the lakehouse gains SDK-based S3 support, annotate the ServiceAccount with this role’s ARN and delete the IAM user. No infra change needed.
Fi’s Bedrock access uses IRSA today (no static credentials).

Phase 2: Install Definite with the definite CLI

1. Install the CLI

curl -fsSL https://storage.googleapis.com/definite-public/definite-onprem/install.sh | sh
The install script detects your OS and architecture, downloads the matching prebuilt binary from a public Google Cloud Storage bucket, verifies its SHA256 checksum, and places definite on your PATH (default: $HOME/.local/bin). Binaries are published for macOS and Linux (arm64 and x86_64), and are uploaded by the release workflow using short-lived Workload Identity Federation credentials (no long-lived keys). To pin a version, set DEFINITE_VERSION before piping to sh:
curl -fsSL https://storage.googleapis.com/definite-public/definite-onprem/install.sh \
  | DEFINITE_VERSION=v0.1.0 sh
Verify:
definite version

2. Point kubectl at the new cluster

aws eks update-kubeconfig \
  --region "$(terraform output -raw region)" \
  --name   "$(terraform output -raw cluster_name)"

kubectl get nodes
You should see your managed node group’s nodes in the Ready state.

3. Bootstrap cluster prerequisites

definite bootstrap installs the cluster-level pieces that definite init assumes are already present:
PrerequisiteWhat it provides
Ingress controllerHTTP/S routing for the deployment’s Ingress resource
cert-manager (+ CRDs)TLS certificate issuance for tls: cert_manager
letsencrypt-prod ClusterIssuerThe issuer the ingress references for automatic Let’s Encrypt certs
agent-sandbox CRDsCustom resources the Fi runtime uses to dispatch per-run sandboxes
Run it once against the fresh cluster:
definite bootstrap --acme-email you@yourcompany.com
--dry-run prints what it would install without touching the cluster. The command is idempotent; safe to re-run.

4. Discover the load balancer hostname

The ingress controller provisions an AWS Network Load Balancer. Wait for it to land, then grab its hostname:
kubectl get svc ingress-nginx-controller -n ingress-nginx \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'
You’ll get something like <random-id>.elb.<your-region>.amazonaws.com. You now have two choices for what to set as deployment.hostname in your config.yaml:
For demos and internal pilots, resolve the LB hostname to an IP and use a <ip-with-dashes>.nip.io host. No DNS configuration needed:
LB_HOST=$(kubectl get svc ingress-nginx-controller -n ingress-nginx \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
LB_IP=$(dig +short "$LB_HOST" | head -1)
HOSTNAME="${LB_IP//./-}.nip.io"
echo "Use this as deployment.hostname: $HOSTNAME"
nip.io wildcards every <ip>.nip.io host to that IP, so Let’s Encrypt issues a real cert with no extra setup.
Some OAuth providers (Google, Slack, HubSpot) reject nip.io redirect URIs. Use a real CNAME for production integrations.

5. Build config.yaml

Start from the EKS example in the repo: examples/minimal-eks.yaml. The shape:
deployment:
  name: definite
  namespace: definite
  hostname: <your-hostname>          # from step 4
  tls: cert_manager

postgres:
  url: postgres://<rds-user>:${POSTGRES_PASSWORD}@<rds-endpoint>:5432/<db-name>

object_store:
  type: s3
  bucket: <bucket-name>              # terraform output lakehouse_bucket_name
  region: <your-region>              # terraform output region
  credentials:
    key_id:
      env: S3_ACCESS_KEY_ID
    secret:
      env: S3_SECRET_ACCESS_KEY

lakehouse:
  prefix: lake/                      # terraform output lakehouse_prefix
  storage:
    size: 50Gi
    storage_class_name: gp3

auth:
  mode: oidc                         # or `local` for username/password auth
  issuer: https://<your-okta-domain>
  client_id: definite-onprem
  client_secret:
    env: OIDC_CLIENT_SECRET

llm:
  provider: bedrock
  region: <your-region>
  model: anthropic.claude-sonnet-4-20250514-v1:0
  # Credentials via IRSA; no credentials block needed.

resources:
  api:
    replicas: 2
    cpu: "1"
    memory: 2Gi
  lakehouse:
    replicas: 1
    cpu: "4"
    memory: 16Gi
  frontend:
    replicas: 2
    cpu: 500m
    memory: 512Mi
  job_runner:
    replicas: 1
    cpu: 500m
    memory: 1Gi
The fastest way to fill in postgres.url, object_store.bucket, and friends is to paste the output of terraform output -raw config_yaml_fragment directly into your config.yaml.
For the full list of knobs (image registry overrides, ingress class, sandbox configuration, etc.), see the config reference.

6. Export secrets

config.yaml references env vars for every secret. Export them from Terraform outputs:
export POSTGRES_PASSWORD=$(terraform output -raw rds_password)
export S3_ACCESS_KEY_ID=$(terraform output -raw lakehouse_s3_access_key_id)
export S3_SECRET_ACCESS_KEY=$(terraform output -raw lakehouse_s3_secret_access_key)
export OIDC_CLIENT_SECRET=...        # only if auth.mode: oidc
Don’t commit config.yaml, terraform.tfvars, or terraform.tfstate to a public repo. The state file holds the RDS password and the S3 secret in cleartext.

7. Preflight with definite doctor

definite doctor --config config.yaml
doctor runs a battery of preflight checks: it connects to Postgres and runs SELECT version(), validates the Kubernetes context, checks object-store config shape, and (for Anthropic) pings the LLM API. Fix anything it flags before moving on.

8. Deploy with definite init

definite init --config config.yaml
init re-runs preflight, renders the bundled Helm chart with your values, and runs helm upgrade --install. It waits for pods to reach Ready by default. Useful flags:
FlagPurpose
--dry-runRender values, don’t apply
--wait=falseReturn as soon as Helm finishes; don’t wait for pods
--skip-preflightSkip doctor (not recommended)
Watch the rollout in another terminal:
definite status --config config.yaml
definite logs api --follow
When the pods are Ready and cert-manager has issued a cert, open your hostname in a browser and log in.
Want Google SSO instead of local auth? See the Google SSO guide. You’ll add an auth.oidc block to your config.yaml and re-run definite upgrade — no need to redo the install.

Day-2 operations

The same CLI handles upgrades, logs, license, and lakehouse maintenance. A few of the most common commands:
definite status   --config config.yaml         # `kubectl get pods,svc,ingress` for the namespace
definite logs api --follow                     # stream component logs
definite upgrade  --config config.yaml         # re-render and re-apply with the current CLI version
definite license  show                         # decode + verify the active license
definite run maintenance stats                 # lakehouse file/snapshot stats
See the CLI reference for every command and flag.

Phase 3: Teardown

When you’re done, tear down the cluster and supporting infra with Terraform:
cd deploy/terraform/aws
terraform destroy
A couple of safety rails are on by default:
VariableDefaultEffect
rds_deletion_protectionfalse (set to true for production)When true, RDS refuses to be deleted; flip it to false first
lakehouse_force_destroyfalseNon-empty buckets won’t be deleted; flip it to true if you really mean to remove the bucket and its contents
For production teardowns, snapshot RDS and back up the S3 bucket first; once Terraform deletes them, they’re gone.

Troubleshooting

SymptomLikely causeFix
terraform apply hangs on EKSCluster takes 15-20 min to provision; this is normalWait
Lakehouse pod stuck Pending on PVCNo default StorageClassConfirm gp3 StorageClass is present: kubectl get storageclass. The module creates it by default; set create_gp3_storage_class = true if you disabled it
definite doctor Postgres check failsRDS security group only allows the cluster security groupThis is by design. Run doctor from a pod in the cluster, or temporarily allowlist your IP on the RDS security group
LB hostname never appearsIngress controller isn’t running, or the AWS Load Balancer Controller fights with ingress-nginxkubectl get pods -n ingress-nginx; if you installed both controllers, pick one
Cert never issuesletsencrypt-prod ClusterIssuer missing, or DNS doesn’t resolve to the LBkubectl describe certificate -n definite shows the cert-manager error
Fi can’t reach BedrockThe Bedrock IRSA ServiceAccount isn’t annotated, or the model isn’t enabled in the regionConfirm serviceAccount.annotations in config.yaml, and request model access in the AWS Bedrock console

Next steps

Support

For issues or questions, contact hello@definite.app or open an issue on definite-app/definite-onprem.