mirror of
https://github.com/runyanjake/jake.runyan.dev.git
synced 2025-10-04 13:57:29 -07:00
53 lines
2.8 KiB
Plaintext
53 lines
2.8 KiB
Plaintext
---
|
|
slug: copilot-with-ollama
|
|
title: Self-Hosted Copilot with Ollama & Public LLMs
|
|
authors: [jrunyan]
|
|
tags: [ai, workflow]
|
|
---
|
|
|
|
# Build it or Buy It?
|
|
During the end of 2024, there's been a lot of coding companions on the market. One I've liked is Cursor, which is a fork of VSCode and uses Clyde as an underlying model
|
|
for the companion, among a few others. Its free tier worked well, and you could game the system a bit by going through the signup process multiple time to get more use
|
|
out of the tool. However they started to patch some of these holes, and I had to look for a fully self hosted solution, as the pricing model for a personal use coding
|
|
companion is, in my opionion, a little steep.
|
|
|
|
# Self host it!
|
|
Hosting LLMs via LM Studio or Ollama are pretty good ways to make this project happen. Since I've used LM Studio enough before, I used Ollama to self host my LLM.
|
|
|
|
Various VSCode plugins are built for self-hosted LLMs, and it's pretty simple to mix and match. In this experiment I'll try a few tools: [LLama Coder](https://marketplace.visualstudio.com/items?itemName=ex3ndr.llama-coder),
|
|
[Code GPT](https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt), and [twinny](https://marketplace.visualstudio.com/items?itemName=rjmacarthy.twinny).
|
|
|
|
## Running a model
|
|
So it's pretty easy to run a model in Ollama. First, get a list of ollama models with `ollama list`. Then use `ollama run` to download and run your intended model.
|
|
```
|
|
ollama list
|
|
ollama run stable-code:3b-code-q4_0
|
|
```
|
|
|
|
## Querying the model directly
|
|
By default you'd be able to just directly interface with the LLM from the interactive prompt. The inference is quite fast.
|
|
Querying via the API is also simple. The run command starts a server, much like in LM Studio, which has the OpenAI style endpoints that you can curl with some example curls:
|
|
```
|
|
curl http://localhost:11434/api/generate -d '{
|
|
"model": "stable-code:3b-code-q4_0",
|
|
"prompt": "Write a hello world program in Java.",
|
|
"stream": false
|
|
}'
|
|
```
|
|
Packages for Ollama are available in most programming languages if you want to use the OpenAI sytle api programmatically.
|
|
|
|
# LLama Coder Plugin
|
|
The Llama Code plugin is able to do tab complete via your local LLM. Luckily it expects a default Ollama setup, which we just configured.
|
|
All we need is the `ollama` cli installed and it should be able to detect it once we install the extension.
|
|
|
|
## Settings
|
|
Next we'll visit the plugin page, click on the settings wheel, and select Settings. For a default installation nothing needs to be saved.
|
|
|
|
## Usage
|
|
When writing code, after a space bar press and time waits, the LLM will be fed with the context and a suggestion will be given.
|
|
|
|
## Resources
|
|
|
|
[Andrea Grandi's dive into this topic](https://www.andreagrandi.it/posts/self-hosting-copilot-replacement/)
|
|
|