this post was submitted on 21 Apr 2024
187 points (94.3% liked)

Selfhosted

39919 readers
302 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

I placed a low bid on an auction for 25 Elitedesk 800 G1s on a government auction and unexpectedly won (ultimately paying less than $20 per computer)

In the long run I plan on selling 15 or so of them to friends and family for cheap, and I'll probably have 4 with Proxmox, 3 for a lab cluster and 1 for the always-on home server and keep a few for spares and random desktops around the house where I could use one.

But while I have all 25 of them what crazy clustering software/configurations should I run? Any fun benchmarks I should know about that I could run for the lolz?

Edit to add:

Specs based on the auction listing and looking computer models:

  • 4th gen i5s (probably i5-4560s or similar)
  • 8GB of DDR3 RAM
  • 256GB SSDs
  • Windows 10 Pro (no mention of licenses, so that remains to be seen)
  • Looks like 3 PCIe Slots (2 1x and 2 16x physically, presumably half-height)

Possible projects I plan on doing:

  • Proxmox cluster
  • Baremetal Kubernetes cluster
  • Harvester HCI cluster (which has the benefit of also being a Rancher cluster)
  • Automated Windows Image creation, deployment and testing
  • Pentesting lab
  • Multi-site enterprise network setup and maintenance
  • Linpack benchmark then compare to previous TOP500 lists
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 18 points 6 months ago* (last edited 6 months ago) (2 children)

Run 70b llama3 on one and have a 100% local, gpt4 level home assistant . Hook it up with coqui.Ai xttsv2 for mind baffling natural language speech (100% local too ) that can imitate anyone's voice. Now, you got yourself Jarvis from Ironman.

Edit : thought they were some kind of beast machines with 192gb ram and stuff. They're just regular middle-low tier pcs.

[–] [email protected] 7 points 6 months ago (1 children)

I tried doing that on my home server, but running it on the CPU is super slow, and the model won't fit on the GPU. Not sure what I'm doing wrong

[–] [email protected] 3 points 6 months ago (1 children)

Sadly, can't really help you much. I have a potato pc and the biggest model I ran on it was Microsoft phi-2 using the candle framework. I used to tinker with Llama.cpp on colab, but it seems they don't handle llama3 yet. ollama says it does , but I've never tried it before. For the speed, It's kinda expected for a 70b model to be really slow on the CPU. How much slow is too slow ? I don't really know...

You can always try the 8b model. People says it's really great and even replaced the 70b models they've been using.

[–] [email protected] 3 points 6 months ago (1 children)

Show as in I waited a few minutes and finally killed it when it didn't seem like it was going anywhere. And this was with the 7b model...

[–] [email protected] 2 points 6 months ago (1 children)

It shouldn't happen for a 8b model. Even on CPU, it's supposed to be decently fast. There's definitely something wrong here.

[–] [email protected] 1 points 6 months ago (1 children)

Hm... Alright, I'll have to take another look at it. I kinda gave up, figuring my old server just didn't have the specs for it

[–] [email protected] 1 points 6 months ago (1 children)

Specs? Try mistral with llama.ccp.

[–] [email protected] 1 points 6 months ago (1 children)

It has a Intel Xeon E3-1225 V2, 20gb of ram, and a Strix GTX 970 with 4gb of VRAM. I've actually tried Mistral 7b and Decapoda Llama 7b, running them in Python with Huggingface's Transformers library (from local models)

[–] [email protected] 2 points 6 months ago* (last edited 6 months ago) (1 children)

Yeah, it's not a potato but not that powerful eaither. Nonetheless, it should run a 7b/8b/9b and maybe 13b models easily.

running them in Python with Huggingface's Transformers library (from local models

That's your problem right here. Python is great for making llms but is horrible at running them. With a computer as weak as yours, every bit of performance counts.

Just try ollama or llama.ccp . Their github is also a goldmine for other projects you could try.

Llama.ccp can partially run the model on the gpu for way faster inference.

Piper is a pretty decent very lightweight tts engine that can be directly run on your cpu if you want to add tts capabilities to your setup.

Good luck and happy tinkering!

[–] [email protected] 3 points 6 months ago (1 children)

Ah, that's good to know! I'll give those other options a shot. Thank you so much for taking the time to help me with that! I'm very new to the whole LLM things, and sorta figuring it out as I go

[–] [email protected] 1 points 6 months ago

Completely forgot to tell you to only use quantized models. Your pc can run 4bit quantized versions of the models I mentioned. That's the key for running llms on at consumer level hardware. You can later read further about the different quantizations and toy with other ones like Q5_K_M and such.

Just read phi-3 got released and apparently it's a 4B that reach gpt 3.5 level. Follow the news and wait for it to be add to ollama/llama.ccp

Thank you so much for taking the time to help me with that! I'm very new to the whole LLM things, and sorta figuring it out as I go

I became fascinated with llms after the first AI booms but all this knowledge is basically useless where I live, so might as well make it useful by teaching people what i know.

[–] [email protected] 2 points 6 months ago (1 children)

These are 10 year old mid range machines. Llama 7b won't even run well

[–] [email protected] 2 points 6 months ago* (last edited 6 months ago) (1 children)

The key is quantized models. A full model wouldn't fit but a 4bit 8b llama3 would fit.

[–] [email protected] 2 points 6 months ago (1 children)

It would fit but it would be very slow

[–] [email protected] 1 points 6 months ago* (last edited 6 months ago)

No. Quantization make it go faster. Not blazing fast, but decent.