Graphorin API reference v0.1.0
Graphorin API reference / @graphorin/provider-llamacpp-node
@graphorin/provider-llamacpp-node
Companion package to
@graphorin/provider— in-process GGUF execution for the Graphorin framework.
Wraps node-llama-cpp@^3.5 to load .gguf model files directly into the same Node process. No daemon, no port to manage, no GPU contention with other processes. Trust class is permanent loopback because the model lives in the same trust boundary as the host process.
Installation
pnpm add @graphorin/provider-llamacpp-node node-llama-cppQuick start
import { llamaCppNodeAdapter } from '@graphorin/provider-llamacpp-node';
import { createProvider } from '@graphorin/provider';
const provider = createProvider(
llamaCppNodeAdapter({
modelPath: '/path/to/qwen2.5-7b-instruct-q4_k_m.gguf',
gpuLayers: 'auto',
}),
);Native token counting
import { LlamaCppNativeCounter } from '@graphorin/provider-llamacpp-node';
import { setGlobalTokenCounter } from '@graphorin/provider/counters';
setGlobalTokenCounter(
new LlamaCppNativeCounter({
model: loadedGgufModel,
modelPath: '/path/to/qwen2.5-7b.gguf',
}),
);The counter wraps the GGUF tokenizer directly, which is strictly tighter than the cl100k_base proxy used by the HTTP-shaped adapters.
HITL durable-resume tradeoff
The in-process adapter does not survive a process restart mid-stream — the model context lives in the running process and is lost on exit. For human-in-the-loop workflows that need durable mid-stream resume across restarts, prefer one of the HTTP-shaped adapters instead:
ollamaAdapter— Ollama HTTP daemonllamaCppServerAdapter— upstreamllama-serverbinaryopenAICompatibleAdapter— LMStudio / LocalAI / vLLM / Together-style endpoints
GGUF model provenance
.gguf model files are not signed by default. Pull only from trusted publishers and verify the SHA-256 of the downloaded file against the publisher's manifest:
huggingface.co/ggml-orghuggingface.co/TheBlokehuggingface.co/bartowskihuggingface.co/unslothhuggingface.co/Qwen(official Qwen distributions)
Full provenance enforcement (allowlist + Sigstore signature verification) is a future Graphorin work item; v0.1 documents the discipline rather than enforcing it at runtime.
Project metadata
- Project Graphorin · v0.1.0 · MIT License · © 2026 Oleksiy Stepurenko
- Repository: https://github.com/o-stepper/graphorin
@graphorin/provider-llamacpp-node — in-process GGUF execution adapter for the Graphorin framework. The package wraps node-llama-cpp@^3.5 to load .gguf model files directly into the same Node process — no daemon, no port to manage, no GPU contention with other processes.
The adapter declares trust: 'loopback' permanently because the model lives in the same trust boundary as the host process; the symmetry mirrors @graphorin/embedder-transformersjs (in-process embedder; same trust boundary).
The companion package is operationally simpler than the HTTP-shaped adapters but does NOT survive a process restart mid-stream — the model context lives in the process and is lost on exit. For HITL durable mid-stream resume, one of the HTTP-shaped adapters (ollamaAdapter, llamaCppServerAdapter, openAICompatibleAdapter) is the better choice.
Classes
| Class | Description |
|---|---|
| LlamaCppNativeCounter | Counter that delegates to model.tokenize(text) from the loaded GGUF instance. Cache invalidation is keyed on the model file path (when supplied) so swapping models invalidates per-message caches upstream. |
Interfaces
| Interface | Description |
|---|---|
| LlamaCppNativeCounterOptions | Options for LlamaCppNativeCounter. |
| LlamaCppNodeAdapterOptions | Options accepted by llamaCppNodeAdapter. |
| LlamaCppNodeRuntimeOverrides | Test-only shape for injecting fixture-driven runtime behaviour. |
| LlamaInstance | Llama engine instance (returned by getLlama()). |
| LlamaModelInstance | Loaded GGUF model. |
| LlamaSessionInstance | Loaded chat session capable of streaming responses. |
Variables
| Variable | Description |
|---|---|
| VERSION | Canonical version constant. Mirrors the package.json version. |
Functions
| Function | Description |
|---|---|
| llamaCppNodeAdapter | Build a Graphorin Provider backed by an in-process GGUF model. The first call lazily loads the node-llama-cpp peer + the model file; subsequent calls reuse the cached instances. |