Nous.Providers.LlamaCpp (nous v0.16.5)
View SourceLlamaCpp NIF-based provider for local LLM inference.
Runs GGUF models directly in-process via llama_cpp_ex NIF bindings.
No HTTP server needed.
Requires optional dep: {:llama_cpp_ex, "~> 0.6.5"}
Usage
# Load model once at app start
:ok = LlamaCppEx.init()
{:ok, llm} = LlamaCppEx.load_model("model.gguf", n_gpu_layers: -1)
# Use with Nous
agent = Nous.new("llamacpp:local",
llamacpp_model: llm,
instructions: "You are helpful."
)
{:ok, result} = Nous.run(agent, "What is Elixir?")Configuration
The llamacpp_model (the loaded model reference) must be passed via options
when creating the model or agent. It is stored in default_settings.
No API key or base URL is needed since inference runs locally via NIFs.
Settings Mapping
Nous settings are mapped to LlamaCppEx options:
| Nous Setting | LlamaCppEx Option | Description |
|---|---|---|
:temperature | :temp | Sampling temperature |
:max_tokens | :max_tokens | Maximum tokens to generate |
:top_p | :top_p | Nucleus sampling |
:json_schema | :json_schema | Constrained JSON output |
:enable_thinking | :enable_thinking | Enable/disable thinking tokens |
Thinking Models
Models like Qwen3 emit <think>...</think> tags by default. To disable:
agent = Nous.new("llamacpp:local",
llamacpp_model: llm,
model_settings: %{enable_thinking: false}
)Or via generate_text:
{:ok, text} = Nous.generate_text("llamacpp:local", "Hello",
llamacpp_model: llm,
enable_thinking: false
)
Summary
Functions
Get the API key from options, environment, or application config.
Get the base URL from options, application config, or default.
Count tokens in messages (rough estimate).
High-level request with message conversion, telemetry, and error wrapping.
High-level streaming request with message conversion and telemetry.
Functions
Get the API key from options, environment, or application config.
Lookup order:
:api_keyoption passed directly- Environment variable (LLAMACPP_MODEL_PATH)
- Application config:
config :nous, llamacpp, api_key: "..."
Get the base URL from options, application config, or default.
Lookup order:
:base_urloption passed directly- Application config:
config :nous, llamacpp, base_url: "..." - Default: local
Count tokens in messages (rough estimate).
Override this in your provider for more accurate counting.
High-level request with message conversion, telemetry, and error wrapping.
Default implementation that:
- Converts messages to provider format
- Builds request params
- Calls chat/2
- Parses response
- Emits telemetry events
- Wraps errors
High-level streaming request with message conversion and telemetry.