Getting Started | Opilot — Ollama for GitHub Copilot

Prerequisites

Install Ollama

Download and install Ollama from ollama.ai/download.

After installation, start the Ollama server:

bash

# Option A: open the Ollama desktop app (macOS / Windows)
# Option B: run from the terminal
ollama serve

Verify it's running:

bash

curl http://localhost:11434
# Expected: "Ollama is running"

(Optional) Log in to Ollama Cloud

Cloud models are hosted by Ollama and require authentication:

bash

ollama login

Follow the prompts in your browser. Once logged in, Cloud models appear in the sidebar's Cloud Models panel.

Install GitHub Copilot Chat

The extension integrates with the GitHub Copilot Chat extension. Make sure it is installed and signed in.

Install Opilot

Search for "Opilot" in the VS Code Extensions panel, or run:

bash

code --install-extension selfagency.opilot

The Ollama activity bar icon (🦙) will appear automatically on next startup.

Step 1: Pull Your First Model

Click the Ollama icon in the activity bar to open the sidebar. Navigate to the Library panel, find a model, and click the Pull (⬇) button on any variant.

Recommended starter models:

Use Case	Model	Size
General chat	`llama3.2:3b`	~2 GB
Code chat (fast)	`qwen2.5-coder:7b`	~5 GB
Code completions	`qwen2.5-coder:1.5b`	~1 GB
Vision	`llava:7b`	~5 GB
Reasoning	`deepseek-r1:7b`	~5 GB

While a model is pulling you'll see streaming progress inside a VS Code notification. You can continue working — pulls happen in the background.

Step 2: Start a Conversation

Via the Model Picker

Open GitHub Copilot Chat (Ctrl+Alt+I / Cmd+Alt+I)
Click the model selector dropdown in the chat input
Select any model prefixed with 🦙 Ollama
Start chatting normally

The model picker shows all locally cached Ollama models alongside the standard Copilot models.

Via the `@ollama` Chat Participant

Type @ollama at the beginning of a Copilot Chat message to direct it straight to your local Ollama instance:

text

@ollama explain the design of this TypeScript module

The @ollama participant is sticky — once you invoke it in a thread, subsequent messages in the thread continue using it automatically.

→ See @ollama Chat Participant for the full guide.

Step 3: (Optional) Enable Inline Completions

To get real-time code suggestions as you type, configure a completion model:

Open Settings (Ctrl+, / Cmd+,) and search for ollama.completionModel
Enter a small, fast model name (e.g. qwen2.5-coder:1.5b)
Make sure ollama.enableInlineCompletions is true

Smaller quantized models (1–3 B parameters) respond quickly enough for a smooth completion experience. See Inline Completions for tuning tips.

Step 4: (Optional) Create a Custom Modelfile

Modelfiles let you define a custom model persona, system prompt, and inference parameters. Click the + button in the Modelfiles sidebar panel to launch the wizard.

→ See Modelfile Manager for the full guide.

Remote Ollama Instances

To connect to a remote Ollama server, update the ollama.host setting:

Open Settings and search for ollama.host
Enter your remote URL (e.g., http://my-server:11434)
If the server requires authentication, run "Ollama: Manage Ollama Auth Token" from the Command Palette to store your bearer token securely

→ See Settings for all available configuration options.

Prerequisites ​

Install Ollama ​

(Optional) Log in to Ollama Cloud ​

Install GitHub Copilot Chat ​

Install Opilot ​

Step 1: Pull Your First Model ​

Step 2: Start a Conversation ​

Via the Model Picker ​

Via the @ollama Chat Participant ​

Step 3: (Optional) Enable Inline Completions ​

Step 4: (Optional) Create a Custom Modelfile ​

Remote Ollama Instances ​

Prerequisites

Install Ollama

(Optional) Log in to Ollama Cloud

Install GitHub Copilot Chat

Install Opilot

Step 1: Pull Your First Model

Step 2: Start a Conversation

Via the Model Picker

Via the `@ollama` Chat Participant

Step 3: (Optional) Enable Inline Completions

Step 4: (Optional) Create a Custom Modelfile

Remote Ollama Instances