Additional

Execution Runtime Model

Synced from github.com/CoWork-OS/CoWork-OS/docs

This document describes the current execution concept for planning, step execution, follow-up turns, delegation, and verification guidance in CoWork OS.

The model is built around four rules:

tool guidance should live with the tool definition instead of being duplicated across prompts
prompt assembly should be sectioned, budgeted, and cache-aware
delegated work should receive a structured brief instead of a raw prompt passthrough
verification guidance should appear before finalization, not only after a post-hoc gate fires

Tool Prompt Layer

Internal tool definitions now carry prompt-render metadata before provider serialization.

LLMTool includes optional internal prompting metadata
one shared render path produces both:
- the final description sent to the active LLM provider
- the compact tool text used in planning
provider adapters still receive only:
- name
- rendered description
- input_schema

Render timing

Tool prompts are rendered only after:

policy filtering
mode filtering
task allowlist and restriction filtering
adaptive tool-availability filtering

This keeps hidden tools from consuming prompt budget.

Render cache

SessionRuntime.getAvailableTools() caches the rendered visible tool set. The cache key includes:

tool catalog version
disabled, restricted, and allowlisted tool state
execution mode
task domain
web-search mode
shell availability
agent type
worker role
user-input allowance
visible tool names

That makes the rendered tool array stable for repeated turns while still invalidating when visibility or execution context changes.

Prompt-aware rollout tools

The tool-prompt layer currently enriches the highest-impact tool families first:

spawn_agent
orchestrate_agents
run_command
web_search
web_fetch
browser navigation/content/text/screenshot tools
request_user_input
task_list_create
task_list_update
create_diagram
qa_run

Descriptions are intentionally capped so provider tool arrays do not grow without bound.

Prompt Stack

Execution prompts are assembled from named sections instead of a single monolithic string.

The shared pieces are:

executor-prompt-sections.ts
ContentBuilder
QueryOrchestrator

Planning, step execution, and follow-up turns all use the same section-composition primitives even when the section mix differs.

Section scopes

Each section declares a cache scope:

session: stable content that can be memoized across turns
turn: dynamic content that is recomputed when inputs change

Examples of stable session-scoped sections:

base identity
shared safety core
autonomy or input gate
workspace/worktree context
mode/domain contract
web-search contract
role context
personality
skill guidelines
infra context
visual QA context

Examples of turn-scoped sections:

current time
transcript recall
memory recall
turn guidance
other step-specific or follow-up-specific context

Provider-aware prompt caching

The section scopes now also drive provider-side prompt caching.

session-scoped cacheable sections become the stable prefix
turn-scoped sections stay outside the cacheable prefix
stable-prefix hashes include provider family, execution mode, task domain, stable system blocks, and rendered tool schema

Current provider strategy:

Anthropic / Azure Anthropic / Anthropic-compatible: prefer Anthropic automatic caching with structured systemBlocks
OpenRouter Claude: use explicit system + last 3 non-system messages, capped at 4 breakpoints
OpenAI / Azure OpenAI: derive deterministic prompt-cache keys from the stable prefix so GPT routes can reuse cached prefixes across follow-ups and profile-based routing
OpenRouter GPT-style routes: keep the same stable-prefix partitioning and cache epoch tracking without Anthropic markers

This is why dynamic sections such as current time, transcript recall, memory recall, and turn guidance are intentionally kept out of the stable prefix.

Section budgeting

Budgets are enforced per section rather than after building one giant prompt.

each section can be truncated independently
optional sections can be dropped by priority
execution logs report which sections were truncated or dropped

This keeps failures interpretable when token pressure rises.

Output Budget Policy

When COWORK_LLM_OUTPUT_POLICY=adaptive is enabled, execution turns use a centralized output-budget resolver instead of depending on scattered per-provider defaults.

Shared policy path

The active execution path resolves output limits from:

provider family
model id
request kind
explicit task-level maxTokens, when present
global env overrides
known adapter hard caps where available
context headroom

This logic is shared across step execution and follow-up loops so they do not drift.

Selection precedence is:

task-level agentConfig.maxTokens
COWORK_LLM_MAX_OUTPUT_TOKENS
adaptive family defaults
final clamping by known hard caps and context headroom

Request kinds

The policy distinguishes:

agentic_main
tool_followup
continuation

That lets the runtime start conservatively on the first turn, then reserve more output room after tool results are present.

Default behavior

Current internal defaults:

main execution turn: 8000
tool follow-up turn: 16000
escalated retry: 48000
escalated retry for Anthropic-family routes: 64000
generic fallback escalation: 16000

The runtime still clamps these values against context headroom and known provider limits before the request is sent.

Provider-aware transport mapping

The output-budget resolver also decides which provider field should carry the limit:

max_tokens
max_completion_tokens
max_output_tokens

This keeps transport differences centralized instead of duplicating field-selection logic across execution code paths.

Truncation recovery

When a turn stops because it hit the output limit:

the runtime retries the same request once with the escalated budget
if the retried response still truncates but contains visible partial output, the existing continuation path is used
if the retried response is reasoning-only or otherwise empty, continuation retries are skipped and the runtime surfaces targeted guidance instead

This means continuation prompts are now the fallback path, not the first response to every truncation.

Explicit chat exception

The explicit chat path keeps its own output behavior for now. The adaptive policy described here applies to the agentic execution and follow-up loops first, while legacy mode preserves the pre-existing executor behavior.

Delegation Contract

Delegation now resolves through a structured briefing contract instead of passing child prompts through unchanged.

Worker roles

Delegation supports:

auto
researcher
implementer
verifier
synthesizer

If worker_role is omitted or set to auto, the runtime infers a role from the delegated request:

read/search/investigate/audit-without-edits -> researcher
review/check/test/validate/second-opinion -> verifier
merge/combine/summarize predecessor outputs -> synthesizer
everything else -> implementer

Structured delegation brief

Every delegated target receives a brief with the same fields:

objective
resolved worker role
parent task and current-step context
scope in
scope out
known findings or evidence
expected deliverable
evidence requirements
completion contract

The structured brief is used for:

native child tasks
orchestration-graph delegation
ACP-targeted delegation
external runtime delegation

Role enforcement

The resolved role is applied through the worker-role registry so the child inherits:

execution expectations
completion contract
tool restrictions
verifier-specific or researcher-specific behavior where applicable

Extraction-mode guardrails, fanout limits, and depth limits still apply on top of the role contract.

Verification Guidance

Verification is now nudged earlier in the turn loop.

Checklist reminders

task_list_create and task_list_update can append an immediate reminder directly to the tool result when:

all implementation items are complete
no verification item exists
the current plan does not already provide verification coverage
the task is not already on a verified path that covers verification explicitly

That reminder is returned in the same tool result envelope that the model sees.

Session reminder

The runtime still keeps the session-level verification nudge state and emits:

task_list_created
task_list_updated
task_list_verification_nudged

This remains the replayable source for renderer and restore paths.

Pre-finalization reminder

Before a task is allowed to wrap up, the executor can inject a pre-finalization reminder when required evidence is still missing, including:

required test runs
required execution-tool evidence
required Playwright QA evidence
pending verification checklist items
later machine-checkable verification requirements implied by upcoming steps

The post-hoc verification runtime still remains the final enforcement path. These reminders exist to steer the model earlier, not to replace the final gate.

Runtime Boundaries

The current split is:

SessionRuntime owns session-state buckets, checklist state, recovery state, tool visibility/render caching, and snapshot restore
TaskExecutor owns bootstrap, planning, finalization, completion policy, and the execution prompt section cache
TurnKernel owns a single active turn

That keeps state ownership explicit while still letting planning, execution, follow-up, delegation, and verification share the same runtime concepts.

Evolving Agent Intelligence

Features

Was this page helpful?Edit this page on GitHub