this post was submitted on 26 Dec 2023
8 points (75.0% liked)

Selfhosted

40132 readers
545 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Simple as that, I have a few GPUs hanging around and I want to build a GPU node for my kubernetes cluster. I already have one, GPUs are attached and it can do cool things like transcoding plex easy enough, but I'm starting to look to the horizon for my next big projects.

So, dream machine, how could I accomplish building a big ol' server with the GPUs I have hanging around and maybe a new one?

What CPU/Mobo, and maybe most important, power supply, would you recommend for running maybe even 4x GPUs? Or maybe just 3 and I'm being crazy.

Won't be building tomorrow like I said, but I just thought of this and was like "man, I bet not a lot of consumer hardware would support this..."

top 11 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 10 months ago* (last edited 10 months ago) (2 children)

If you are doing high bandwidth GPU work, then PCIe lanes of consumer CPUs are going to be the bottleneck, as they generally only support 16 lanes.
Then there are the threadrippers, xeons and all the server/professional class CPUs that will do 40+ lanes of PCIe.

A lane of PCIe3.0 is about 1GBps (Byte not bit).
So, if you know your workload and bandwidth requirements, then you can work from that.
If you don't need full 16 lanes per GPU, then a motherboard that supports bifurcation will allow you to run 4 GPUs with 4 lanes each from a CPU that has 16 lanes if PCIe. That's 4GBps per GPU, or 32Gbps.
If it's just for transcoding, and you are running into limitations of consumer GPUs (which I think are limited to 3 simultaneous streams), you could get a pro/server GPU like the Nvidia quadros, which have a certain amount of resources but are unlimited in the number of streams it can process (so, it might be able to do 300 FPS of 1080p. If your content is 1080p 30fps, that's 10 streams). From that, you can work out bandwidth requirements, and see if you need more than 4 lanes per GPU.

I'm not sure what's required for AI. I feel like it is similar to crypto mining, massive compute but relatively small amounts of data.

Ultimately, if you think your workload can consume more than 4 lanes per GPU, then you have to think about where that data is coming from. If it's coming from disk, then you are going to need raid0 NVMe storage which will take up additional PCIe lanes.

[–] [email protected] 3 points 10 months ago

I’m not sure what’s required for AI. I feel like it is similar to crypto mining, massive compute but relatively small amounts of data.

If you're talking about training models, I think it requires both massive compute and massive amounts of data.

[–] [email protected] 2 points 10 months ago (1 children)

Nvidia transcode limit is 5 for consumer GPUs these days, and its very easy to lift that limit if you need with https://github.com/keylase/nvidia-patch

[–] [email protected] 1 points 10 months ago (1 children)

5? Holy heck, that's amazing. I remember helping people that had built streaming rigs to use during the pandemic, and wondering why their production was stuttering and having issues with a bunch remote callers. Some of that work ended up being CPU bound.
Although, looks like that patch is for Linux? Not much use if your running vmix or some other windows-only software.
In OPs case, however, that's not a problem

[–] [email protected] 2 points 10 months ago

I think you can get it to work with windows somehow , but I've never needed to try: https://github.com/keylase/nvidia-patch/issues/520

[–] [email protected] 2 points 10 months ago (2 children)

I wonder if you could copy (or buy used) some crypto mining rigs for this. I'm not sure if there's some kind of bottleneck im not aware of though.

[–] [email protected] 4 points 10 months ago (1 children)

That's what I was thinking, but less... Fire hazard? I've seen some of those that are just crazy. Idk mostly need a board that can handle it. Idk just dreaming of a new project with spare stuff hanging around.

[–] [email protected] 1 points 10 months ago (1 children)

I would guess they're a fire hazard because of the overclocking they do. They're either a long term (heh) project and they're immaculate, or they know they need to squeeze every bit of value and abuse the fuck out of those GPUs. I think you can tell if a rig is dangerous so you should be ok

[–] [email protected] 1 points 10 months ago

Yeah I watched some of those and they were just nuts, just wires running everywhere, PSUs hacked so they can have multiple PSUs on a single node, paper clips - just stupid.

[–] [email protected] 1 points 10 months ago* (last edited 10 months ago)

Often the mining rigs use just 1-4 pcie lanes (per GPU), because more isn't required for mining and it saves on other costs

[–] [email protected] 1 points 10 months ago

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
NVMe Non-Volatile Memory Express interface for mass storage
PCIe Peripheral Component Interconnect Express
PSU Power Supply Unit

[Thread #395 for this sub, first seen 1st Jan 2024, 20:05] [FAQ] [Full list] [Contact] [Source code]