this post was submitted on 12 Mar 2025
49 points (81.0% liked)

Selfhosted

44512 readers
1122 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Wondering about services to test on either a 16gb ram "AI Capable" arm64 board or on a laptop with modern rtx. Only looking for open source options, but curious to hear what people say. Cheers!

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 6 days ago (1 children)

LMStudio is pretty much the standard. I think it's opensource except for the UI. Even if you don't end up using it long-term, it's great for getting used to a lot of the models.

Otherwise there's OpenWebUI that I would imagine would work as a docker compose, as I think there's ARM images for OWU and ollama

[–] [email protected] 2 points 2 days ago (1 children)

Well they are fully closed source except for the open source project they are a wrapper on. The open source part is llama.cpp

[–] [email protected] 1 points 2 days ago (1 children)

Fair enough, but it's damn handy and simple to use. And I don't know how to do speculative decoding with ollama, which massively speeds up the models for me.

[–] [email protected] 1 points 2 days ago (1 children)

Their software is pretty nice. That's what I'd recommand to someone who doesn't want to tinker. It's just a shame they don't want to open source their software and we have to reinvent the wheel 10 times. If you are willing to tinker a bit koboldcpp + openewebui/librechat is a pretty nice combo.

[–] [email protected] 1 points 2 days ago (1 children)

That koboldcpp is pretty interesting. Looks like I can load a draft model for spec decode as well as a pile of other things.

What local models have you been using for coding? I've been disappointed with things like deepseek-coder and the qwen-coder, it's not even a patch on Claude, but that damn cost for anthropic has been killing me.

[–] [email protected] 1 points 1 day ago (1 children)

As much as I'd like to praise the open-weight models. Nothing comes close to Claude sonnet in my experience too. I use local models when info are sensitive and claude when the problem requires being somewhat competent.

What setup do you use for coding? I might have a tip for minimizing claude cost you depending on what your setup is.

[–] [email protected] 2 points 1 day ago (1 children)

I'm using vscode/Roocode with Gosucoder shortprompt, with Requesty providing models. Generally I'll use R1 to outline a project and Claude to implement. The shortprompt seems to reduce the context quite a bit and hence the cost. I've heard about Cursor but haven't tried it yet.

When you're using local models, which ones are you using? The ones I mention don't seem to give me much I can use, but I'm also probably asking more of them because I see what Claude can do. It might also be a problem with how Roocode uses them, though when I just jump into a chat and ask it to spit out code, I don't get much better.

[–] [email protected] 2 points 1 day ago* (last edited 1 day ago)

If you are willing to pay 10$ a month. You should get GithubCopilot, it provides near unlimited claude 3.5 usage. RooCode can hook into the github copilot api, and use it for its generations.

I use Qwen Coder and Mistral small locally too. It works ok, but its nowhere near GPT/Claude in terms of response quality.