Agent runtime

@graphorin/agent is the runtime layer of the framework. It owns the typed model -> tool calls -> model loop, the streaming event surface, durable human-in-the-loop approvals, multi-agent handoffs, agent-level model fallback, post-compaction hooks, per-tool model-tier hints, and a lateral-leak defense layer.

Library-mode-first

Every primitive that is useful from a script ships from the npm package without the optional standalone server:

createAgent({...})
RunState.toJSON() / RunState.fromJSON(serialized, agent)
The filter library
evaluatorOptimizer({...})
agent.fanOut({...})
agent.progress.write(...) / agent.progress.read(...)

Promote to the standalone server only when your assistant has to outlive a single Node.js process or expose a network API.

Quick start

import { createAgent } from '@graphorin/agent';
import { createProvider, ollamaAdapter } from '@graphorin/provider';

const agent = createAgent({
  name: 'helpful-assistant',
  instructions: 'You are a helpful, concise assistant.',
  provider: createProvider(
    ollamaAdapter({ baseURL: 'http://127.0.0.1:11434', model: 'qwen2.5:7b-instruct' }),
    { acceptsSensitivity: ['public', 'internal'] },
  ),
});

for await (const event of agent.stream('Plan a trip to Mars')) {
  if (event.type === 'text.delta') process.stdout.write(event.delta);
}

Streaming-first

Every operation returns AsyncIterable<AgentEvent<TOutput>>. agent.run(...) is a thin "collect" helper that exhausts the stream. The discriminated AgentEvent<TOutput> union is exhaustive — every event type is its own typed interface — and the runtime uses an assertNever(...) default branch so the compile fails the moment a new event type lands without a handler:

// A simplified shape that mirrors the @graphorin/core
// `AgentEvent<TOutput>` discriminated union. Hover any
// identifier below to see the inferred type.
type AgentEvent
<TOutput
> =
  | { type
: 'agent.start'; runId
: string }
  | { type
: 'step.start'; stepNumber
: number }
  | { type
: 'text.delta'; delta
: string }
  | { type
: 'tool.call.start'; toolCallId
: string; toolName
: string }
  | { type
: 'tool.call.end'; toolCallId
: string }
  | { type
: 'tool.execute.start'; toolCallId
: string }
  | { type
: 'tool.execute.end'; toolCallId
: string }
  | { type
: 'tool.approval.requested'; toolCallId
: string }
  | { type
: 'context.compacted'; trimmedTokens
: number }
  | { type
: 'agent.model.fellback'; previousModel
: string; nextModel
: string }
  | { type
: 'agent.end'; output
: TOutput
 };

function assertNever
(value
: never): never {
  throw new Error
(`Unhandled event: ${JSON
.stringify
(value
)}`);
}

function handle
<TOutput
>(event
: AgentEvent
<TOutput
>): void {
  switch (event
.type
) {
    case 'text.delta':
      process
.stdout
.write
(event
.delta
);
      return;
    case 'tool.call.start':
    case 'tool.call.end':
    case 'tool.execute.start':
    case 'tool.execute.end':
      console
.log
(event
.toolCallId
);
      return;
    case 'tool.approval.requested':
      console
.log
('approval needed for', event
.toolCallId
);
      return;
    case 'agent.model.fellback':
      console
.log
('fellback', event
.previousModel
, '->', event
.nextModel
);
      return;
    case 'agent.start':
    case 'step.start':
    case 'context.compacted':
    case 'agent.end':
      return;
    default:
      assertNever
(event
);
  }
}

Durable HITL

runStateToJSON(runState) / runStateFromJSON(serialised, agent) round-trip the full run state through any storage the caller picks (file, SQLite, KV, S3). A pending approval can be persisted, the process can shut down, and another machine can pick up exactly where the first left off by re-invoking agent.run(savedRunState, { directive: { approvals: [...] } }).

The tool.approval.requested event carries the toolCallId plus the tool's classification metadata. Operators that need to suspend the run combine the event with a snapshot of the current RunState:

import { runStateToJSON } from '@graphorin/agent';

for await (const event of agent.stream('Refund the last order if it qualifies', {
  sessionId: 's1',
  userId: 'u1',
})) {
  if (event.type === 'tool.approval.requested') {
    const serialised = runStateToJSON(currentRunState);
    await persist(serialised);
    return; // process exits; humans look at the approval offline
  }
}

Multi-agent

agent.toTool({ name, description, exposeTurns, secretsInheritance, inheritSecrets, inputFilter }) wraps an agent as a typed tool the parent agent can call. The default secretsInheritance: 'inherit-allowlist' with an empty inheritSecrets array enforces the principle of least authority — sub-agents inherit nothing unless explicitly granted.

`secretsInheritance`	Behaviour
`'inherit-allowlist'` (default)	Sub-agent inherits only the secret refs explicitly listed in `inheritSecrets`.
`'forward-explicit'`	Sub-agent receives only the secret refs forwarded for this specific call.
`'isolated'`	Sub-agent receives no inherited secrets at all.

Filter library

Handoffs use a built-in filter library to shape the payload that crosses the boundary. Every filter returns a serializable HandoffInputFilterDescriptor so a JSONL session export can replay the same boundary byte-equal.

Filter	What it does
`filters.lastN(n)`	Keep only the last N messages.
`filters.lastUser`	Keep only the latest user turn.
`filters.summary({...})`	Replace history with a summary.
`filters.bySensitivity({...})`	Keep / drop / require by `Sensitivity`.
`filters.stripReasoning()`	Drop reasoning content parts.
`filters.stripSensitiveOutputs()`	Drop sensitive tool outputs.
`filters.stripToolCalls()`	Drop tool calls.
`filters.compose(...)`	Compose any of the above.

Cancellation

agent.abort({ drain, onPendingApprovals }) is hard-kill by default with a 50 ms grace window. Set drain: true to wait for the current step to complete; choose how pending approvals behave with onPendingApprovals: 'deny' | 'hold' | 'fail' (default 'deny').

Reasoning preservation

Tool-use loops round-trip reasoning content parts (with opaque meta such as signature / data) into the next provider call when the effective reasoningRetention is not 'strip'. The handoff boundary is independent: filters.stripReasoning() is always applied to messages forwarded to a sub-agent regardless of the intra-loop policy.

Agent-level model fallback

import { createProvider, ollamaAdapter, vercelAdapter } from '@graphorin/provider';

const agent = createAgent({
  name: 'helpful-assistant',
  instructions: 'You are a helpful, concise assistant.',
  provider: createProvider(vercelAdapter({ provider: 'openai', model: 'gpt-4o' })),
  fallbackModels: [
    {
      provider: createProvider(vercelAdapter({ provider: 'openai', model: 'gpt-4o-mini' })),
      model: 'gpt-4o-mini',
    },
    {
      provider: createProvider(ollamaAdapter({ model: 'qwen2.5:7b-instruct' })),
      model: 'qwen2.5:7b-instruct',
    },
  ],
});

fallbackModels: ReadonlyArray<ModelSpec> retries the whole step against the next model on rate-limit, capacity, or context-length errors. A ModelSpec is either a Provider instance or { provider, model }. The agent.model.fellback event fires per transition, and per-model usage attribution lands in RunState.usage.byModel.

Post-compaction hooks

When @graphorin/memory.contextEngine auto-compacts the buffer, the runtime fires every registered postCompactionHooks[i] between the trim and the next provider.stream(...) call. Failed hooks are isolated; the harness continues with the survivors.

Agent-step-level fan-out

const result = await agent.fanOut({
  children: [
    { agentId: 'researcher', invoke: () => childA.run('Research the topic') },
    { agentId: 'writer', invoke: () => childB.run('Draft the section') },
  ],
  mergeStrategy: { kind: 'concat', separator: '\n\n' },
  perBudget: { tokens: 4000, toolCalls: 8, durationMs: 30_000 },
  maxConcurrentChildren: 4,
});

agent.fanOut(...) is a thin wrapper over the standalone runFanOut(...) helper. It spawns N sub-agents under a bounded-fanout cap (default maxConcurrentChildren: 4) with per-child token / tool-call / duration budgets and four built-in merge strategies:

`mergeStrategy.kind`	Shape	Behaviour
`'concat'`	`{ kind: 'concat'; separator?: string }` (default)	Concatenate every successful child output.
`'first-success'`	`{ kind: 'first-success' }`	Pick the first child that completes successfully.
`'judge-merge'`	`{ kind: 'judge-merge'; judge: (children) => Promise<TOutput> }`	Operator-supplied judge function. Guarded by the merge guard.
`'custom'`	`{ kind: 'custom'; merge: (children) => Promise<TOutput> }`	Operator-supplied merge function.

Evaluator-optimizer loop

evaluatorOptimizer({...}) is a Generator → Evaluator iteration loop with three rubric kinds ('free-form', 'zod', 'llm-judge') and a required iteration cap.

Progress artifacts

agent.progress.write(content, { role, seq, sensitivity, tags }) and agent.progress.read({ runId, role, sinceSeq, maxArtifacts }) persist UTF-8 text artifacts to the artifact root via atomic-write .tmp + rename discipline so cross-session continuity holds even on hard crashes.

Per-tool model-tier hints

import { tool } from '@graphorin/tools';

const planTool = tool({
  name: 'plan',
  description: 'Generate a multi-step plan',
  preferredModel: 'smart',
  // …
});

Agent.modelTierMap resolves the cost-tier vocabulary ('fast' | 'balanced' | 'smart') to concrete Provider instances at agent warm-up. The per-step planner walks the precedence ladder once per step:

text

'prepare-step' > 'tier-map' | 'spec' > 'agent-preferred' > 'fallthrough-default'

Lateral-leak defense layer

Three opt-in agent-level guards configured on createAgent({ causalityMonitor, mergeGuard, protocolGuard }). They compose orthogonally with the other security layers (sub-agent secrets isolation, handoff input filter, outbound redaction, inbound sanitisation):

causalityMonitor — implements an Agentic Reference Monitor pattern: every cross-agent flow is checked against the stated capability, with a configurable strictness level.
mergeGuard — per-child trust scoring + bias detection on the 'judge-merge' fan-out strategy.
protocolGuard — control-character escape catalogue applied at protocol boundaries.
Commentary-phase trace sanitisation runs at the session-output boundary in @graphorin/sessions.

Inbound sanitisation preamble

When the assembled message list contains any non-trusted MessageContent part, the runtime appends the locale-resolved preamble fragment to the system prompt after the cache breakpoint so the trusted-only cache prefix is not invalidated.

Next steps

Memory system — what memory.tools exposes.
Tools — how to declare your own typed tools.
Workflow engine — durable graph runs that span multiple agent steps.
Sessions — multi-agent attribution and replay.

Agent runtime ​

Library-mode-first ​

Quick start ​

Streaming-first ​

Durable HITL ​

Multi-agent ​

Filter library ​

Cancellation ​

Reasoning preservation ​

Agent-level model fallback ​

Post-compaction hooks ​

Agent-step-level fan-out ​

Evaluator-optimizer loop ​

Progress artifacts ​

Per-tool model-tier hints ​

Lateral-leak defense layer ​

Inbound sanitisation preamble ​

Next steps ​