Context
Context
How kern builds the prompt the model sees on each turn.
System prompt
Reloaded on every message. Composed from the agent's repo files and runtime state:
| Section | Tag | Source |
|---|---|---|
| Agent behavior | <document path="AGENTS.md"> |
Repo file |
| Agent identity | <document path="IDENTITY.md"> |
Repo file |
| Runtime docs | <document path="KERN.md"> |
Template from kern |
| Knowledge index | <document path="KNOWLEDGE.md"> |
Repo file |
| Paired users | <document path="USERS.md"> |
Repo file |
| Latest daily note | <document path="notes/..."> |
Most recent notes/ file |
| Notes summary | <notes_summary> |
LLM summary of previous 5 daily notes, cached in DB |
| Available tools | <tools> |
Based on toolScope config |
| Conversation summary | <conversation_summary> |
Compressed summaries from segments (only when messages are trimmed) |
Changes to notes or knowledge files are picked up immediately — no restart needed.
Context window
maxContextTokens (default 100000) sets the token budget. When the full message history exceeds this, the oldest messages are trimmed from the front. Nothing is lost — full history stays in session JSONL files.
Tool result truncation
Tool results (command output, file contents, web pages) can be large. maxToolResultChars (default 20000) truncates oversized results in context while keeping the full output in session storage and the recall index.
Token budget allocation
The total budget is split between:
- Conversation summary —
summaryBudgetfraction (default 0.75 = 75%) for compressed conversation summaries from segments. This portion is cached via prompt caching for supported models (Anthropic), making it effectively free. - Raw messages — the remaining budget for actual conversation messages
Conversation summary
When old messages are trimmed, the agent loses direct access to that history. Segments solve this by injecting compressed summaries of the trimmed region.
Pipeline:
- Segmentation — messages are grouped into semantic segments (L0) based on embedding similarity. Topic shifts create boundaries. Runs incrementally after each turn.
- Summarization — each segment is summarized by an LLM (~10-20:1 compression).
- Rollup — every 10 L0 segments are rolled up into an L1 parent. 10 L1s → L2, etc. This builds a hierarchical tree.
- Injection —
composeHistoryfills the summary budget with summaries from the tree, using breadth-first expansion: highest-level segments expand first (L2→L1 before L1→L0) for balanced coverage across the full history. The trim boundary is snapped to L0 segment edges and then walked back to the nearest user message for turn-safe boundaries.
The result is a <conversation_summary> block in the system prompt:
<conversation_summary>
<summary>
level: L2
messages: 0-4500
...high-level summary of early conversation...
</summary>
<summary>
level: L0
messages: 20766-20793
first: 2026-04-03T06:39:26.634Z
last: 2026-04-03T06:44:25.437Z
...detailed recent summary...
</summary>
</conversation_summary>
Requires: recall enabled (uses embeddings for segmentation). Controlled by summaryBudget — set to 0 to disable.
Prompt caching
Anthropic models via OpenRouter use explicit cache_control: ephemeral markers to enable server-side prefix caching (~90% cost reduction on cached tokens). See Caching for the full design including cache breakpoints, trim snapping, and provider differences.
Segmentation thresholds:
- Triggers when 10+ unsegmented messages AND 10k+ unsegmented tokens accumulate
- Targets ~15k tokens per segment, minimum 5k tokens and 10 messages
Auto-recall
When autoRecall: true, the runtime automatically searches past conversations before each turn and injects relevant results as a <recall> block at the top of the message list. This is ephemeral — not persisted to the session.
Only fires when messages have been trimmed (long sessions). Capped at ~2000 tokens. See Tools → recall for the manual search tool.
Inspection
The web UI includes a Memory overlay with five tabs:
- Sessions — session list with message counts, durations, role breakdowns, and activity charts. Live session indicator. Click any session to expand.
- Segments — hierarchical segment tree (L0/L1/L2). Click any segment to see its markdown summary, token compression stats, and metadata. Filter by "All" or "In context". Collapsible rolled-up groups for child segments.
- Notes — notes summaries with regeneration. Rendered as markdown.
- Recall — stats (indexed messages, chunks, sessions, date range) and semantic search.
- Context — structured view of the full system prompt. Parses XML tags into collapsible colored sections with token cost bars. Shows real token breakdown (system + summary + messages).
API endpoints
GET /context/system— the full composed system prompt as textGET /context/segments— segment IDs and metadata currently injected bycomposeHistory()GET /sessions— session list withcurrentSessionIdfor live session identificationGET /recall/stats— recall index stats (messages, chunks, sessions, date range)