Using Ollama With Roo Code

Roo Code supports running models locally using Ollama. This provides privacy, offline access, and potentially lower costs, but requires more setup and a powerful computer.

Website: https://ollama.com/

Setting up Ollama

Download and Install Ollama: Download the Ollama installer for your operating system from the Ollama website. Follow the installation instructions. Make sure Ollama is running
```
ollama serve
```
Download a Model: Ollama supports many different models. You can find a list of available models on the Ollama website. Some recommended models for coding tasks include:
- codellama:7b-code (good starting point, smaller)
- codellama:13b-code (better quality, larger)
- codellama:34b-code (even better quality, very large)
- qwen2.5-coder:32b
- mistralai/Mistral-7B-Instruct-v0.1 (good general-purpose model)
- deepseek-coder:6.7b-base (good for coding tasks)
- llama3:8b-instruct-q5_1 (good for general tasks)
To download a model, open your terminal and run:
```
ollama pull <model_name>
```
For example:
```
ollama pull qwen2.5-coder:32b
```
Configure the Model: Configure your model’s context window in Ollama and save a copy. Roo automatically reads the model’s reported context window from Ollama and passes it as num_ctx; no Roo-side context size setting is required for the Ollama provider.

Load the model (we will use qwen2.5-coder:32b as an example):
```
ollama run qwen2.5-coder:32b
```
Change context size parameter:
```
/set parameter num_ctx 32768
```
Save the model with a new name:
```
/save your_model_name
```
Configure Roo Code:
- Open the Roo Code sidebar ( icon).
- Click the settings gear icon ().
- Select "ollama" as the API Provider.
- Enter the model tag or saved name from the previous step (e.g., your_model_name).
- (Optional) Configure the base URL if you're running Ollama on a different machine. The default is http://localhost:11434.
- (Optional) Enter an API Key if your Ollama server requires authentication.
- (Advanced) Roo uses Ollama's native API by default for the "ollama" provider. An OpenAI-compatible /v1 handler also exists but isn't required for typical setups.

Tips and Notes

Resource Requirements: Running large language models locally can be resource-intensive. Make sure your computer meets the minimum requirements for the model you choose.
Model Selection: Experiment with different models to find the one that best suits your needs.
Offline Use: Once you've downloaded a model, you can use Roo Code offline with that model.
Token Tracking: Roo Code tracks token usage for models run via Ollama, helping you monitor consumption.
Ollama Documentation: Refer to the Ollama documentation for more information on installing, configuring, and using Ollama.

Troubleshooting

Out of Memory (OOM) on First Request

Symptoms

First request from Roo fails with an out-of-memory error
GPU/CPU memory usage spikes when the model first loads
Works after you manually start the model in Ollama

Cause If no model instance is running, Ollama spins one up on demand. During that cold start it may allocate a larger context window than expected. The larger context window increases memory usage and can exceed available VRAM or RAM. This is an Ollama startup behavior, not a Roo Code bug.

Fixes

Preload the model
```
ollama run &lt;model-name&gt;
```
Keep it running, then issue the request from Roo.

Pin the context window (num_ctx)

Option A — interactive session, then save:

# inside `ollama run &lt;base-model&gt;`
/set parameter num_ctx 32768
/save &lt;your_model_name&gt;

Option B — Modelfile:

PARAMETER num_ctx 32768

Then re-create the model:

ollama create &lt;your_model_name&gt; -f Modelfile

Ensure the model's context window is pinned Save your Ollama model with an appropriate num_ctx (e.g., via /set + /save, or a Modelfile). Roo reads this automatically and passes it as num_ctx; there is no Roo-side context size setting for the Ollama provider.
Use smaller variants If GPU memory is limited, use a smaller quant (e.g., q4 instead of q5) or a smaller parameter size (e.g., 7B/13B instead of 32B).
Restart after an OOM
```
ollama ps
ollama stop &lt;model-name&gt;
```

Quick checklist

Model is running before Roo request
num_ctx pinned (Modelfile or /set + /save)
Model saved with appropriate num_ctx (Roo uses this automatically)
Model fits available VRAM/RAM
No leftover Ollama processes

Setting up Ollama​

Tips and Notes​

Troubleshooting​

Out of Memory (OOM) on First Request​

Setting up Ollama

Tips and Notes

Troubleshooting

Out of Memory (OOM) on First Request