Skip to content

Kilo

Kilo Code should use a named OpenAI-compatible provider that points at Codex Pooler’s /v1 base URL. Kilo appends /chat/completions itself, so do not set baseURL to a full /v1/chat/completions endpoint.

Use the current npm package:

Terminal window
npm install -g @kilocode/cli@latest

For one-off use without a global install, run npx -y @kilocode/cli@latest.

For a deployed instance, add ~/.config/kilo/kilo.jsonc:

{
"$schema": "https://app.kilo.ai/config.json",
"model": "codex-pooler/gpt-5.5",
"enabled_providers": ["codex-pooler"],
"provider": {
"codex-pooler": {
"options": {
"apiKey": "{env:CODEX_POOLER_API_KEY}",
"baseURL": "https://codex-pooler.example.com/v1"
},
"models": {
"gpt-5.5": {
"name": "GPT-5.5 via Codex Pooler",
"tool_call": true,
"reasoning": true,
"temperature": false,
"attachment": true,
"modalities": {
"input": ["text", "image"],
"output": ["text"]
},
"limit": {
"context": 272000,
"input": 228000,
"output": 64000
}
}
}
}
},
"compaction": {
"threshold_percent": 75
}
}

For local setup, change baseURL to http://localhost:4000/v1.

{env:CODEX_POOLER_API_KEY} keeps the Pool API key outside the config file. Define only model ids your assigned Pool can serve. If you add Kilo permissions, use the object form such as "permission": {"bash": "allow"}; do not set "permission": "ask", which is not a valid Kilo config shape.

Kilo uses OpenCode-style limit.{context,input,output} fields but includes reasoning tokens in overflow accounting and supports compaction.threshold_percent for preflight compaction. limit.input: 228000 leaves 208k usable input tokens after the default 20k reserve; the optional 75% threshold asks Kilo to compact even earlier.

For GPT-5 OpenAI-compatible models, Kilo suppresses the outgoing max-token request field to avoid incompatible max_tokens, so limit.output is still important for local context math and UI even when it is not forwarded.

Kilo is a chat-completions client. It sends model requests to POST /v1/chat/completions, and Codex Pooler translates supported chat-completions requests into Codex Responses work internally. Do not point Kilo at /backend-api/codex, /v1/responses, or /v1/chat/completions as the configured base URL.

Run a tool-using prompt from an isolated directory:

Terminal window
mkdir -p /tmp/codex-pooler-kilo-check
cd /tmp/codex-pooler-kilo-check
export CODEX_POOLER_API_KEY=<pool-api-key>
kilo run \
--model codex-pooler/gpt-5.5 \
--pure \
--auto \
--format json \
--dir "$PWD" \
'Use your tools to create kilo-ok.txt containing exactly: kilo ok. After the file exists, reply with exactly: kilo ok'

--pure keeps external plugins out of the check. --auto is only for trusted isolated automation, because it lets Kilo approve tool permissions automatically.

Kilo model requests use Codex Pooler’s narrow OpenAI-compatible /v1 support for selected SDK routes. Codex Pooler doesn’t provide full OpenAI API parity.

Codex Pooler model use does not require MCP. If you need operator metadata from /mcp, use a separate MCP-capable host and authenticate it with an operator-owned MCP token, not the Pool API key.