Skip to content

Graphorin API reference v0.1.0


Graphorin API reference / @graphorin/provider-llamacpp-node

@graphorin/provider-llamacpp-node

Companion package to @graphorin/provider — in-process GGUF execution for the Graphorin framework.

Wraps node-llama-cpp@^3.5 to load .gguf model files directly into the same Node process. No daemon, no port to manage, no GPU contention with other processes. Trust class is permanent loopback because the model lives in the same trust boundary as the host process.

Installation

bash
pnpm add @graphorin/provider-llamacpp-node node-llama-cpp

Quick start

ts
import { llamaCppNodeAdapter } from '@graphorin/provider-llamacpp-node';
import { createProvider } from '@graphorin/provider';

const provider = createProvider(
  llamaCppNodeAdapter({
    modelPath: '/path/to/qwen2.5-7b-instruct-q4_k_m.gguf',
    gpuLayers: 'auto',
  }),
);

Native token counting

ts
import { LlamaCppNativeCounter } from '@graphorin/provider-llamacpp-node';
import { setGlobalTokenCounter } from '@graphorin/provider/counters';

setGlobalTokenCounter(
  new LlamaCppNativeCounter({
    model: loadedGgufModel,
    modelPath: '/path/to/qwen2.5-7b.gguf',
  }),
);

The counter wraps the GGUF tokenizer directly, which is strictly tighter than the cl100k_base proxy used by the HTTP-shaped adapters.

HITL durable-resume tradeoff

The in-process adapter does not survive a process restart mid-stream — the model context lives in the running process and is lost on exit. For human-in-the-loop workflows that need durable mid-stream resume across restarts, prefer one of the HTTP-shaped adapters instead:

  • ollamaAdapter — Ollama HTTP daemon
  • llamaCppServerAdapter — upstream llama-server binary
  • openAICompatibleAdapter — LMStudio / LocalAI / vLLM / Together-style endpoints

GGUF model provenance

.gguf model files are not signed by default. Pull only from trusted publishers and verify the SHA-256 of the downloaded file against the publisher's manifest:

  • huggingface.co/ggml-org
  • huggingface.co/TheBloke
  • huggingface.co/bartowski
  • huggingface.co/unsloth
  • huggingface.co/Qwen (official Qwen distributions)

Full provenance enforcement (allowlist + Sigstore signature verification) is a future Graphorin work item; v0.1 documents the discipline rather than enforcing it at runtime.

Project metadata

@graphorin/provider-llamacpp-node — in-process GGUF execution adapter for the Graphorin framework. The package wraps node-llama-cpp@^3.5 to load .gguf model files directly into the same Node process — no daemon, no port to manage, no GPU contention with other processes.

The adapter declares trust: 'loopback' permanently because the model lives in the same trust boundary as the host process; the symmetry mirrors @graphorin/embedder-transformersjs (in-process embedder; same trust boundary).

The companion package is operationally simpler than the HTTP-shaped adapters but does NOT survive a process restart mid-stream — the model context lives in the process and is lost on exit. For HITL durable mid-stream resume, one of the HTTP-shaped adapters (ollamaAdapter, llamaCppServerAdapter, openAICompatibleAdapter) is the better choice.

Classes

ClassDescription
LlamaCppNativeCounterCounter that delegates to model.tokenize(text) from the loaded GGUF instance. Cache invalidation is keyed on the model file path (when supplied) so swapping models invalidates per-message caches upstream.

Interfaces

InterfaceDescription
LlamaCppNativeCounterOptionsOptions for LlamaCppNativeCounter.
LlamaCppNodeAdapterOptionsOptions accepted by llamaCppNodeAdapter.
LlamaCppNodeRuntimeOverridesTest-only shape for injecting fixture-driven runtime behaviour.
LlamaInstanceLlama engine instance (returned by getLlama()).
LlamaModelInstanceLoaded GGUF model.
LlamaSessionInstanceLoaded chat session capable of streaming responses.

Variables

VariableDescription
VERSIONCanonical version constant. Mirrors the package.json version.

Functions

FunctionDescription
llamaCppNodeAdapterBuild a Graphorin Provider backed by an in-process GGUF model. The first call lazily loads the node-llama-cpp peer + the model file; subsequent calls reuse the cached instances.