Selfhosted

46293 readers

44 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

[email protected]

Consumer GPUs to run LLMs (lemmy.dbzer0.com)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]

40 comments fedilink hide all child comments

Not sure if this is the right place, if not please let me know.

GPU prices in the US have been a horrific bloodbath with the scalpers recently. So for this discussion, let's keep it to MSRP and the lucky people who actually managed to afford those insane MSRPs + managed to actually find the GPU they wanted.

Which GPU are you using to run what LLMs? How is the performance of the LLMs you have selected? On an average, what size of LLMs are you able to run smoothly on your GPU (7B, 14B, 20-24B etc).

What GPU do you recommend for decent amount of VRAM vs price (MSRP)? If you're using the TOTL RX 7900XTX/4090/5090 with 24+ GB of RAM, comment below with some performance estimations too.

My use-case: code assistants for Terraform + general shell and YAML, plain chat, some image generation. And to be able to still pay rent after spending all my savings on a GPU with a pathetic amount of VRAM (LOOKING AT BOTH OF YOU, BUT ESPECIALLY YOU NVIDIA YOU JERK). I would prefer to have GPUs for under $600 if possible, but I want to also run models like Mistral small so I suppose I don't have a choice but spend a huge sum of money.

Thanks

You can probably tell that I'm not very happy with the current PC consumer market but I decided to post in case we find any gems in the wild.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 4 points 1 month ago (1 children)

I got it working with my 6800XT. I'm running deep seek r1 14b (somewhere around there) and the deep seek coder V2. I have a link to a blog with those instructions

https://gotosocial.michaeldileo.org/@mdileo/statuses/01JQA4M4Q33PMCADH9M2AWQSS8

[–] [email protected] 1 points 1 month ago (1 children)

Thank you. Are 14B models the biggest you can run comfortably?

[–] [email protected] 2 points 1 month ago (1 children)

The coder model has only that one. The ones bigger than that are like 20GB+, and my GPU has 16GB. I've only tried two models, but it looked like the size balloons after that, so that may be the biggest models that I can run.

[–] [email protected] 1 points 1 month ago (1 children)

Do you have any recommendations for running the Mistral small model? I'm very interested in it alongside CodeLlama, OogaBooga and others

[–] [email protected] 0 points 1 month ago (1 children)

I haven't tried those, so not really, but with open web UI, you can download and run anything, just make sure it fits in your vram so it doesn't run on the CPU. The deep seek one is decent. I find that i like chatgpt 4-o better, but it's still good.

[–] [email protected] 1 points 1 month ago (1 children)

In general how much VRAM do I need for 14B and 24B models?

[–] [email protected] 2 points 1 month ago (1 children)

It really depends on how you quantize the model and the K/V cache as well. This is a useful calculator. https://smcleod.net/vram-estimator/ I can comfortably fit most 32b models quantized to 4-bit (usually KVM or IQ4XS) on my 3090’s 24 GB of VRAM with a reasonable context size. If you’re going to be needing a much larger context window to input large documents etc then you’d need to go smaller with the model size (14b, 27b etc) or get a multi GPU set up or something with unified memory and a lot of ram (like the Mac Minis others are mentioning).

[–] [email protected] 1 points 1 month ago

Oh and I typically get 16-20 tok/s running a 32b model on Ollama using Open WebUI. Also I have experienced issues with 4-bit quantization for the K/V cache on some models myself so just FYI