Selfhosted

46293 readers

44 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

[email protected]

How to use GPUs over multiple computers for local AI? (lemmy.dbzer0.com)

submitted 1 month ago by [email protected] to c/[email protected]

64 comments fedilink hide all child comments

The problem is simple: consumer motherboards don't have that many PCIe slots, and consumer CPUs don't have enough lanes to run 3+ GPUs at full PCIe gen 3 or gen 4 speeds.

My idea was to buy 3-4 computers for cheap, slot a GPU into each of them and use 4 of them in tandem. I imagine this will require some sort of agent running on each node which will be connected through a 10Gbe network. I can get a 10Gbe network running for this project.

Does Ollama or any other local AI project support this? Getting a server motherboard with CPU is going to get expensive very quickly, but this would be a great alternative.

Thanks

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 4 points 1 month ago* (last edited 1 month ago) (1 children)

If you want to use supercomputer software, setup SLURM scheduler on those machines. There are many tutorials how to do distributed gpu computing with slurm. I have it on my todo list.
https://github.com/SchedMD/slurm
https://slurm.schedmd.com/

[–] [email protected] 1 points 1 month ago (1 children)

Thanks but I'm not going to run supercomputers. I just want to run 4 GPUs separately because of inadequate PCIe lanes in a single computer to run 24B-30B models

[–] [email protected] 1 points 1 month ago* (last edited 1 month ago) (1 children)

I believe you can run 30B models on single used rtx 3090 24GB at least I run 32B deepseek-r1 on it using ollama. Just make sure you have enought ram > 24GB.

[–] [email protected] 1 points 1 month ago (1 children)

Heavily quantized?

[–] [email protected] 1 points 1 month ago* (last edited 1 month ago) (1 children)

I run this one. https://ollama.com/library/deepseek-r1:32b-qwen-distill-q4_K_M with this frontend https://github.com/open-webui/open-webui on single rtx 3090 hardware 64gb ram. It works quite well for what I wanted it to do. I wanted to connect 2x 3090 cards with slurm to run 70b models but haven't found time to do it.

[–] [email protected] 1 points 1 month ago

I see. Thanks