# Monitoring and runtime triage

Codex Pooler exposes Prometheus metrics at `/metrics`. Metrics auth is managed from `/admin/system`; if a bearer token is configured, Prometheus must scrape with the matching Kubernetes Secret reference instead of putting the raw token in Helm values.

Use monitoring for runtime evidence only. Dashboards, alerts, logs, tickets, and copied queries must not include prompts, response bodies, uploaded files, websocket frames, cookies, bearer tokens, upstream secrets, or raw Pool API keys.

## Collector path

The Helm chart can render a Prometheus Operator `ServiceMonitor` for the app service:

```yaml
monitoring:
  serviceMonitor:
    enabled: true
    labels:
      release: kube-prometheus-stack
    interval: 10s
    scrapeTimeout: 5s
```

The `release` label and scrape interval should match your own Prometheus Operator selectors and scrape budget. For fast OOM investigations, prefer a short interval for the app ServiceMonitor and keep broader Kubernetes collectors at their normal cadence. Worker and scheduler roles don't start the Prometheus reporter because they don't expose `/metrics`; use Kubernetes cgroup metrics and sampler logs for those pods.

## Runtime triage dashboard

Build your Grafana or Prometheus dashboard around the signals needed to correlate runtime pressure across memory, request handling, gateway admission, database access, and restarts:

1. Kubernetes cgroup working set versus pod memory limit, RSS/cache split, and cgroup memory not explained by BEAM total
2. BEAM memory total, processes, binary, ETS, code, atom, atom used, and system memory by pod
3. BEAM process count, port count, total run queue, CPU run queue, and IO run queue
4. app restarts, OOM events, and current pod last terminated reason
5. request rate, HTTP status class rate, and p95 endpoint/router latency
6. gateway admission accepted, queued, rejected, timed-out, and queue-time p95 signals
7. stream-buffer oversized and truncated events
8. Ecto query rate, queries per request, hot query sources by safe source and SQL command, unknown query rate, and DB queue p95

Read the dashboard as a correlation view. If cgroup memory climbs while BEAM total stays flat, look outside normal BEAM heap attribution. If `vm_memory_binary_bytes` climbs with cgroup memory, inspect streaming response retention, file bodies, and upstream transport buffering. If process count, ports, or run queue climb, inspect stuck request processes, websocket ownership, and overloaded route classes.

![Codex Pooler Grafana runtime triage dashboard](/codex-pooler-grafana.png)

[Download the starter Grafana dashboard JSON](/operators/monitoring/codex-pooler-runtime-triage.json)

## Useful PromQL

```text
max by (pod) (
  container_memory_working_set_bytes{namespace="codex-pooler", container="app", image!=""}
)
```

```text
vm_memory_total_bytes{namespace="codex-pooler", job="codex-pooler-app"}
vm_memory_binary_bytes{namespace="codex-pooler", job="codex-pooler-app"}
vm_memory_processes_bytes{namespace="codex-pooler", job="codex-pooler-app"}
vm_memory_ets_bytes{namespace="codex-pooler", job="codex-pooler-app"}
vm_memory_system_bytes{namespace="codex-pooler", job="codex-pooler-app"}
```

```text
clamp_min(
  max by (pod) (
    container_memory_working_set_bytes{
      namespace="codex-pooler",
      container="app",
      image!=""
    }
  )
  - on (pod)
  max by (pod) (
    vm_memory_total_bytes{namespace="codex-pooler", job="codex-pooler-app"}
  ),
  0
)
```

```text
increase(kube_pod_container_status_restarts_total{
  namespace="codex-pooler",
  exported_container="app"
}[15m])
```

```text
rate(codex_pooler_gateway_stream_buffer_oversized_count_total[5m])
rate(codex_pooler_gateway_stream_buffer_truncated_count_total[5m])
```

```text
sum by (pod) (
  rate(codex_pooler_repo_query_count{namespace="codex-pooler", job="codex-pooler-app"}[5m])
)
```

```text
topk(10,
  sum by (source, command) (
    rate(codex_pooler_repo_query_count{namespace="codex-pooler", job="codex-pooler-app"}[5m])
  )
)
```

```text
sum by (pod, command) (
  rate(codex_pooler_repo_query_count{
    namespace="codex-pooler",
    job="codex-pooler-app",
    source="unknown"
  }[5m])
)
```

```text
histogram_quantile(0.95,
  sum by (le, source, command) (
    rate(codex_pooler_repo_query_total_time_seconds_bucket{
      namespace="codex-pooler",
      job="codex-pooler-app"
    }[5m])
  )
)
```

```text
sum by (method, status_class) (
  rate(codex_pooler_http_request_count{
    namespace="codex-pooler",
    job="codex-pooler-app"
  }[5m])
)
```

```text
sum by (route_class, transport) (
  rate(codex_pooler_gateway_admission_enqueued_count{
    namespace="codex-pooler",
    job="codex-pooler-app"
  }[5m])
)
```

```text
histogram_quantile(0.95,
  sum by (le, route_class, transport) (
    rate(codex_pooler_gateway_admission_dequeued_time_seconds_bucket{
      namespace="codex-pooler",
      job="codex-pooler-app"
    }[5m])
  )
)
```

```text
histogram_quantile(0.95,
  sum by (le, pod) (
    rate(codex_pooler_repo_query_queue_time_seconds_bucket{
      namespace="codex-pooler",
      job="codex-pooler-app"
    }[5m])
  )
)
```

## Memory sampler logs

The in-process memory sampler is enabled by default in every release role. When BEAM total memory or cgroup usage crosses the configured threshold, it logs a sanitized snapshot with role metadata, memory categories, cgroup memory stats, process and port counts, top processes by memory, top processes by message queue length, and top ETS tables by memory. It never logs ETS table contents, messages, request bodies, prompts, bearer tokens, websocket frames, or upstream payloads.

Emergency tuning environment variables:

```bash
CODEX_POOLER_MEMORY_SAMPLER_ENABLED=true
CODEX_POOLER_MEMORY_SAMPLER_THRESHOLD_RATIO=0.70
CODEX_POOLER_MEMORY_SAMPLER_MIN_INTERVAL_MS=60000
CODEX_POOLER_MEMORY_SAMPLER_TOP_PROCESSES=20
CODEX_POOLER_MEMORY_SAMPLER_TOP_ETS_TABLES=20
CODEX_POOLER_MEMORY_SAMPLER_LIMIT_BYTES=1073741824
```

Use a lower threshold or shorter interval only during active investigation, because logs are the only signal likely to capture a worker or scheduler spike that reaches OOM before the next Prometheus scrape. Worker and scheduler pods do not expose the app `/metrics` endpoint and do not start the Prometheus reporter, so combine their sampler logs with Kubernetes cgroup memory, restart, OOM, and memory-limit metrics.