Additional
Execution Runtime Model
Synced from github.com/CoWork-OS/CoWork-OS/docs
This document describes the current execution concept for planning, step execution, follow-up turns, delegation, and verification guidance in CoWork OS.
The model is built around four rules:
- tool guidance should live with the tool definition instead of being duplicated across prompts
- prompt assembly should be sectioned, budgeted, and cache-aware
- delegated work should receive a structured brief instead of a raw prompt passthrough
- verification guidance should appear before finalization, not only after a post-hoc gate fires
Tool Prompt Layer
Internal tool definitions now carry prompt-render metadata before provider serialization.
LLMToolincludes optional internalpromptingmetadata- one shared render path produces both:
- the final
descriptionsent to the active LLM provider - the compact tool text used in planning
- the final
- provider adapters still receive only:
name- rendered
description input_schema
Render timing
Tool prompts are rendered only after:
- policy filtering
- mode filtering
- task allowlist and restriction filtering
- adaptive tool-availability filtering
This keeps hidden tools from consuming prompt budget.
Render cache
SessionRuntime.getAvailableTools() caches the rendered visible tool set. The cache key includes:
- tool catalog version
- disabled, restricted, and allowlisted tool state
- execution mode
- task domain
- web-search mode
- shell availability
- agent type
- worker role
- user-input allowance
- visible tool names
That makes the rendered tool array stable for repeated turns while still invalidating when visibility or execution context changes.
Prompt-aware rollout tools
The tool-prompt layer currently enriches the highest-impact tool families first:
spawn_agentorchestrate_agentsrun_commandweb_searchweb_fetch- browser navigation/content/text/screenshot tools
request_user_inputtask_list_createtask_list_updatecreate_diagramqa_run
Descriptions are intentionally capped so provider tool arrays do not grow without bound.
Prompt Stack
Execution prompts are assembled from named sections instead of a single monolithic string.
The shared pieces are:
executor-prompt-sections.tsContentBuilderQueryOrchestrator
Planning, step execution, and follow-up turns all use the same section-composition primitives even when the section mix differs.
Section scopes
Each section declares a cache scope:
session: stable content that can be memoized across turnsturn: dynamic content that is recomputed when inputs change
Examples of stable session-scoped sections:
- base identity
- shared safety core
- autonomy or input gate
- workspace/worktree context
- mode/domain contract
- web-search contract
- role context
- personality
- skill guidelines
- infra context
- visual QA context
Examples of turn-scoped sections:
- current time
- transcript recall
- memory recall
- turn guidance
- other step-specific or follow-up-specific context
Provider-aware prompt caching
The section scopes now also drive provider-side prompt caching.
- session-scoped cacheable sections become the stable prefix
- turn-scoped sections stay outside the cacheable prefix
- stable-prefix hashes include provider family, execution mode, task domain, stable system blocks, and rendered tool schema
Current provider strategy:
- Anthropic / Azure Anthropic / Anthropic-compatible: prefer Anthropic automatic caching with structured
systemBlocks - OpenRouter Claude: use explicit
system + last 3 non-system messages, capped at 4 breakpoints - OpenAI / Azure OpenAI: derive deterministic prompt-cache keys from the stable prefix so GPT routes can reuse cached prefixes across follow-ups and profile-based routing
- OpenRouter GPT-style routes: keep the same stable-prefix partitioning and cache epoch tracking without Anthropic markers
This is why dynamic sections such as current time, transcript recall, memory recall, and turn guidance are intentionally kept out of the stable prefix.
Section budgeting
Budgets are enforced per section rather than after building one giant prompt.
- each section can be truncated independently
- optional sections can be dropped by priority
- execution logs report which sections were truncated or dropped
This keeps failures interpretable when token pressure rises.
Output Budget Policy
When COWORK_LLM_OUTPUT_POLICY=adaptive is enabled, execution turns use a centralized output-budget resolver instead of depending on scattered per-provider defaults.
Shared policy path
The active execution path resolves output limits from:
- provider family
- model id
- request kind
- explicit task-level
maxTokens, when present - global env overrides
- known adapter hard caps where available
- context headroom
This logic is shared across step execution and follow-up loops so they do not drift.
Selection precedence is:
- task-level
agentConfig.maxTokens COWORK_LLM_MAX_OUTPUT_TOKENS- adaptive family defaults
- final clamping by known hard caps and context headroom
Request kinds
The policy distinguishes:
agentic_maintool_followupcontinuation
That lets the runtime start conservatively on the first turn, then reserve more output room after tool results are present.
Default behavior
Current internal defaults:
- main execution turn:
8000 - tool follow-up turn:
16000 - escalated retry:
48000 - escalated retry for Anthropic-family routes:
64000 - generic fallback escalation:
16000
The runtime still clamps these values against context headroom and known provider limits before the request is sent.
Provider-aware transport mapping
The output-budget resolver also decides which provider field should carry the limit:
max_tokensmax_completion_tokensmax_output_tokens
This keeps transport differences centralized instead of duplicating field-selection logic across execution code paths.
Truncation recovery
When a turn stops because it hit the output limit:
- the runtime retries the same request once with the escalated budget
- if the retried response still truncates but contains visible partial output, the existing continuation path is used
- if the retried response is reasoning-only or otherwise empty, continuation retries are skipped and the runtime surfaces targeted guidance instead
This means continuation prompts are now the fallback path, not the first response to every truncation.
Explicit chat exception
The explicit chat path keeps its own output behavior for now. The adaptive policy described here applies to the agentic execution and follow-up loops first, while legacy mode preserves the pre-existing executor behavior.
Delegation Contract
Delegation now resolves through a structured briefing contract instead of passing child prompts through unchanged.
Worker roles
Delegation supports:
autoresearcherimplementerverifiersynthesizer
If worker_role is omitted or set to auto, the runtime infers a role from the delegated request:
- read/search/investigate/audit-without-edits ->
researcher - review/check/test/validate/second-opinion ->
verifier - merge/combine/summarize predecessor outputs ->
synthesizer - everything else ->
implementer
Structured delegation brief
Every delegated target receives a brief with the same fields:
- objective
- resolved worker role
- parent task and current-step context
- scope in
- scope out
- known findings or evidence
- expected deliverable
- evidence requirements
- completion contract
The structured brief is used for:
- native child tasks
- orchestration-graph delegation
- ACP-targeted delegation
- external runtime delegation
Role enforcement
The resolved role is applied through the worker-role registry so the child inherits:
- execution expectations
- completion contract
- tool restrictions
- verifier-specific or researcher-specific behavior where applicable
Extraction-mode guardrails, fanout limits, and depth limits still apply on top of the role contract.
Verification Guidance
Verification is now nudged earlier in the turn loop.
Checklist reminders
task_list_create and task_list_update can append an immediate reminder directly to the tool result when:
- all implementation items are complete
- no verification item exists
- the current plan does not already provide verification coverage
- the task is not already on a verified path that covers verification explicitly
That reminder is returned in the same tool result envelope that the model sees.
Session reminder
The runtime still keeps the session-level verification nudge state and emits:
task_list_createdtask_list_updatedtask_list_verification_nudged
This remains the replayable source for renderer and restore paths.
Pre-finalization reminder
Before a task is allowed to wrap up, the executor can inject a pre-finalization reminder when required evidence is still missing, including:
- required test runs
- required execution-tool evidence
- required Playwright QA evidence
- pending verification checklist items
- later machine-checkable verification requirements implied by upcoming steps
The post-hoc verification runtime still remains the final enforcement path. These reminders exist to steer the model earlier, not to replace the final gate.
Runtime Boundaries
The current split is:
SessionRuntimeowns session-state buckets, checklist state, recovery state, tool visibility/render caching, and snapshot restoreTaskExecutorowns bootstrap, planning, finalization, completion policy, and the execution prompt section cacheTurnKernelowns a single active turn
That keeps state ownership explicit while still letting planning, execution, follow-up, delegation, and verification share the same runtime concepts.