Nous.Providers.LlamaCpp (nous v0.16.5)

View Source

LlamaCpp NIF-based provider for local LLM inference.

Runs GGUF models directly in-process via llama_cpp_ex NIF bindings. No HTTP server needed.

Requires optional dep: {:llama_cpp_ex, "~> 0.6.5"}

Usage

# Load model once at app start
:ok = LlamaCppEx.init()
{:ok, llm} = LlamaCppEx.load_model("model.gguf", n_gpu_layers: -1)

# Use with Nous
agent = Nous.new("llamacpp:local",
  llamacpp_model: llm,
  instructions: "You are helpful."
)

{:ok, result} = Nous.run(agent, "What is Elixir?")

Configuration

The llamacpp_model (the loaded model reference) must be passed via options when creating the model or agent. It is stored in default_settings.

No API key or base URL is needed since inference runs locally via NIFs.

Settings Mapping

Nous settings are mapped to LlamaCppEx options:

Nous SettingLlamaCppEx OptionDescription
:temperature:tempSampling temperature
:max_tokens:max_tokensMaximum tokens to generate
:top_p:top_pNucleus sampling
:json_schema:json_schemaConstrained JSON output
:enable_thinking:enable_thinkingEnable/disable thinking tokens

Thinking Models

Models like Qwen3 emit <think>...</think> tags by default. To disable:

agent = Nous.new("llamacpp:local",
  llamacpp_model: llm,
  model_settings: %{enable_thinking: false}
)

Or via generate_text:

{:ok, text} = Nous.generate_text("llamacpp:local", "Hello",
  llamacpp_model: llm,
  enable_thinking: false
)

Summary

Functions

Get the API key from options, environment, or application config.

Get the base URL from options, application config, or default.

Count tokens in messages (rough estimate).

High-level request with message conversion, telemetry, and error wrapping.

High-level streaming request with message conversion and telemetry.

Functions

api_key(opts \\ [])

@spec api_key(keyword()) :: String.t() | nil

Get the API key from options, environment, or application config.

Lookup order:

  1. :api_key option passed directly
  2. Environment variable (LLAMACPP_MODEL_PATH)
  3. Application config: config :nous, llamacpp, api_key: "..."

base_url(opts \\ [])

@spec base_url(keyword()) :: String.t()

Get the base URL from options, application config, or default.

Lookup order:

  1. :base_url option passed directly
  2. Application config: config :nous, llamacpp, base_url: "..."
  3. Default: local

count_tokens(messages)

@spec count_tokens(list()) :: integer()

Count tokens in messages (rough estimate).

Override this in your provider for more accurate counting.

request(model, messages, settings)

High-level request with message conversion, telemetry, and error wrapping.

Default implementation that:

  1. Converts messages to provider format
  2. Builds request params
  3. Calls chat/2
  4. Parses response
  5. Emits telemetry events
  6. Wraps errors

request_stream(model, messages, settings)

High-level streaming request with message conversion and telemetry.