Skip to content

The @ollama chat participant provides a dedicated, history-aware conversational interface to your local Ollama models inside GitHub Copilot Chat.

Invoking @ollama

Type @ollama at the beginning of any Copilot Chat message:

text
@ollama explain the architecture of this TypeScript project

The participant is sticky — once invoked in a conversation thread, it continues handling subsequent messages automatically until you switch to a different participant.

How It Differs from the Model Picker

Model Picker@ollama
InvocationSelect from dropdown@ollama prefix
StickyThread model stays selectedYes, participant stays selected
Default modelMost recently selectedollama.completionModel or first available
Tool callingYesYes

Context and History

The @ollama participant passes the full conversation history from Copilot Chat to Ollama in each request, so it maintains context across turns:

text
@ollama what does the `activate` function do in this file?
# (reads through the response)
How does it handle errors?
# (follow-up — still using @ollama with full context)

Workspace Context

When VS Code provides workspace context (open files, selected code, terminal output), it is injected as XML-tagged context blocks at the beginning of the conversation. The extension extracts these tags and promotes them to a system message, so they influence replies without cluttering the conversation view.

Tool Calling

For models with tool-calling capability (🛠 badge), the @ollama participant can invoke VS Code tools:

text
@ollama look at the test file for sidebar.ts and tell me what cases are missing

The tool loop runs inside the extension — the model emits tool call requests, VS Code executes them, and the results flow back for the next response turn. If a model rejects the tool schema (not all models support OpenAI-compatible function calling), the extension automatically retries the request without tools.

Vision

For models with vision capability (👁 badge), you can attach images directly in the chat input:

  • Drag an image file into the chat input
  • Paste an image from clipboard
text
@ollama what is shown in this screenshot?
[image attached]

Images are automatically stripped for models that do not support vision, so you can leave images in your message history without worrying about errors.

Thinking Models

For models that expose chain-of-thought reasoning (🧠 badge — DeepSeek-R1, Qwen QwQ, Kimi, etc.), responses are split into two collapsible sections:

  • Thinking — the model's internal reasoning, wrapped in <think>…</think> tags
  • Response — the final answer

This makes it easy to review the reasoning or skip straight to the answer.

Streaming

Responses stream token-by-token in real time. The extension uses an XML stream filter to safely handle model responses that may emit XML-like system tags inside their output, ensuring they are rendered as plain text rather than being interpreted as injected context.

Error Handling

SituationBehavior
Ollama unreachableError message with "Open Settings" and "Open Logs" options
Model not foundError with model name; use sidebar to pull model
Tool schema rejectedAutomatic retry without tools
Model crash / OOM detectedWarning dialog with model name and "Open Logs" option
Rate limit (cloud models)Error surfaced in chat

Released under the MIT License