this post was submitted on 31 Dec 2023
136 points (97.2% liked)

Selfhosted

39980 readers
770 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

I recently got it into my head to compare the various popular video codecs in an effort to better understand how av1 works and looks compared to x264 and x265. I also had ideas of using a intel video card to compress a home video security setup, and what levels of compression I would need to get good results.

The Setup
I used the 4k 6.3gb blender project, tears of steel as a source. I downscaled the video to 1080p using all three codecs, and then attempted to compare the results using various crf levels.

To compare results I used imgsli, FFMetrics, and my own picture viewer to try and see what the differences are.

The Results

crf av1 KB x265 KB x264 KB
18 419,261 632,079 685,217 – x246 visually lossless
21 352,337 390,358 – x265 visually lossless 411,439
24 301,517 – av1 VAMF visually lossless 250,426 263,524 – x264 good enough
27 245,685 165,079 – x265 good enough 176,919
30 205,008 110,062 122,458
33 168,192 73,528 86,899
36 139,379 – av1 My visually lossless 48,516 63,214
39 116,096 31,670 47,161
42 97,365 – av1 my good enough 20,636 35,801
45 81,805 13,598 27,484
48 69,044 9,726 20,823
51 58,316 8,586 – worst possible 16,120 – worst possible
54 48,681 - -
57 39,113 - -
60 29,062 - -
63 16,533 – worst possible - -

Here is av1 rcf 36 vs crf 24.

I go into more detail with the hows and whys of my choices, in my journal-style blog post, as well as how i came to these conclusions, But in essence, if you want to lose practically no visual information, crf24 through 36 for av1, crf 21 for x265, and crf 18 for x264 will do the job.

If you are low on space, using my 'good enough' choices will get you practically the same visual results while using less space, depending on the codec.

top 30 comments
sorted by: hot top controversial new old
[–] [email protected] 17 points 10 months ago (1 children)

You might want to use a code block instead of bullet points for your table, the way you presented it is unreadable but I found the info on your blog page.

One of my criteria for video formats is the portability. Like sometimes I might watch something through a web browser which natively supports x264. Yeah x265 provides better compression, and AV1 certainly looks interesting, but they both require the addition of codecs on most of my viewing devices and in some cases that's not possible.

For most cases I've found that CRF25 with x264 works reasonably well. I tend to download 720p videos to watch on our 1080p TV and don't notice the difference except in very minor situations like rapid motion on a solid-color background (usually only seen on movie studio logo screens). Any sort of animated shows can go even lower without noticeable degradation.

[–] [email protected] 3 points 10 months ago (2 children)

I did try to format the table here better. I used code blocks the first time, and it ended up being even uglier. After about four edit attempts i kinda just gave up. Tables don't seem to exist as far as I can tell either.

Your experience with x264 just about matches up with mine. As long as I don't pixel peep, crf 24 does a pretty great job of conveying the information. It also does a pretty great job of working with just about everything compatibility-wise. I don't expect it to go away any time soon specifically because of that.

AV1 is super neat in that we can buy hardware accelerated encoding for it for really cheap using the Intel Arc video cards, and can be decoded by their latest CPU generation. It makes for a great choice for something like security camera footage where playback compatibility is good enough (you can play it in a modern pc), hardware encoding works with a 200$ card, and you save a lot of money using the video card instead of buying extra storage space.

[–] [email protected] 10 points 10 months ago (1 children)
Tables do exist !
| Tables | do | exist | ! |
|--------|----|-------|---|
[–] [email protected] 7 points 10 months ago

Stolen. Thank you.

[–] [email protected] 1 points 10 months ago (1 children)

with a 200$ card, and you save a lot of money using the video card instead of buying extra storage space.

With $200, you could buy ~12TB worth of HDD(s) instead. You'd need >36TB of video for that to make financial sense and you'd always lose quality.

Additionally, you'd have to factor in the power it needs to transcode but, with HW accel, it's not quite as much as with CPUs.

[–] [email protected] 6 points 10 months ago (1 children)

Sure, but that is a choice that couldn't be made without first checking how much space is saved by switching codecs. This helps with making that decision, but i'm well aware it is only part of the information needed.

[–] [email protected] 3 points 10 months ago

Oh the data is absolutely fine and helpful; I only take issue with the conclusion ;)

[–] [email protected] 13 points 10 months ago (1 children)

Feels like certain information is missing. You get very different results both in encoding time and file size depending what preset you use.

CRF value also can't be translated 1:1 between codecs so comparing e.g. h265 CRF 21 to h264 CRF 21 doesn't mean much.

[–] [email protected] 4 points 10 months ago (1 children)

I consider the 'good enough' level to be, if I didn't pixel peep, I couldn't tell the difference. The visually lossless levels were the first crf levels where I couldn't tell a quality difference even when pixel peeping with imgsli. I also included VAMF results, which say that the quality loss levels are all the same at a pixel level.

I know that av1, x264, and x265 all have different ways of compressing video. Obviously, the whole point of this was to get a better idea of what that actually looked like. Everything on the visually lossless section is completely indistinguishable to my eyes, and everything on the good enough section has very minor bits of compression only noticed when i'm looking for it in a still image. This does not require the same codec to compare and contrast with.

Frankly, for anything other than real-time encoding, I don't actually consider encoding time to be a huge deal. None of my encodes were slower than 3fps on my 5800x3d, which is plenty for running on my media server as overnight job. For real-time encoding, I would just grab a Intel Arc card, and redo the whole thing since the bitrates will be different anyways.

[–] [email protected] 3 points 10 months ago (1 children)

Frankly, for anything other than real-time encoding, I don't actually consider encoding time to be a huge deal. None of my encodes were slower than 3fps on my 5800x3d, which is plenty for running on my media server as overnight job. For real-time encoding, I would just grab a Intel Arc card, and redo the whole thing since the bitrates will be different anyways.

Encoding speed heavily depends on your preset. Veryslow will give you better compression than medium or fast, but at a heavy expense of encoding speed. You're not gonna re-encode a movie overnight on slow preset. GPU encoding will also give you worse result than CPU encode so that's something one would have to take into consideration. It's not a big deal when you're streaming, but if it's for video files, I'd much prefer using the CPU.

I consider the 'good enough' level to be, if I didn't pixel peep, I couldn't tell the difference. The visually lossless levels were the first crf levels where I couldn't tell a quality difference even when pixel peeping with imgsli. I also included VAMF results, which say that the quality loss levels are all the same at a pixel level.

I was mostly talking about how you organised your table by using CRF values as the rows. It implies that one should compare the results in each row, however that wouldn't be a comparison that makes much sense. E.g. looking at row "24" one might think that av1 is less effective than h264/5 due to greater file size, but the video quality is vastly different. A more "informative" way to present the data might have been to organise each row by their vmaf score.

Hopefully I don't come across as too cross or argumentative, just want to give some feedback on how to present the data in clearer way for people who aren't familiar with how encoding works.

[–] [email protected] 4 points 10 months ago (1 children)

Why is GPU encoding worse than CPU encoding?

[–] [email protected] 7 points 10 months ago

GPU encoding uses (relatively) simpler fixed function encoders that do it much faster than the CPU which uses its general purpose transistors to run an encoding algorithm. End result is GPU encoding is speedy at the cost of visual quality per bitrate; the file size is bigger for same visual quality as a CPU encode. Importantly for storing your videos - CPU encoding, while much slower, will get your file size smaller at the same visual quality threshold you desire, so you can save more videos per drive!

[–] [email protected] 8 points 10 months ago

I would like to have seen more data on that table. The time it took to run each video compression.. the final bitrate of each stream. Besides that, very interesting results.

[–] [email protected] 8 points 10 months ago (2 children)

The "av1" numbers, which codec is that? There are many av1 encoders and even for Intel HW accel, there are at least two.

[–] [email protected] 1 points 10 months ago (1 children)

From my blogpost, i'm using the following command to encode the video;

ffmpeg -i source.2160p.mkv
-map 0:v:0
-map -0:a -map -0:s -map_metadata -1
-c:v libsvtav1
-preset 3
-vf scale=w=1920:-2
-crf 23
dest.1080p.av1.mkv

[–] [email protected] 3 points 10 months ago (2 children)

That is not representative of what you'd get with an Intel card then. While they implement the same standard (AV1), they're entirely different encoders with entirely different image quality characteristics.

[–] [email protected] 2 points 10 months ago (1 children)

How does that work? Aren't two encoders of the same format supposed to produce the same output for the same input and configuration using some given algorithm? Otherwise I'd consider them different formats/codecs... 🤷‍♂️ Maybe that's wrong of me?

[–] [email protected] 2 points 10 months ago (1 children)

The issue is, you can optimize a software encoders continually, you can use tricks for better quality etc.

A hardware encoder is just that - hardware. As soon as it's burned to the silicon, you're not making any (at least substantial) changes to it. You might also be limited by what you can actually do directly in hardware without using too much die space.

Tldr.: no, you won't get the same result

[–] [email protected] 2 points 10 months ago (1 children)

Tldr.: no, you won't get the same result

What I'm saying is, shouldn't you?

[–] [email protected] 8 points 10 months ago* (last edited 10 months ago) (1 children)

What you describe is true for many file formats, but for most lossy compression systems the "standard" basically only strictly explains how to decode the data and any encoder that produces output that successfully decodes that way is fine.

And the standard defines a collection of "tools" that the encoders can use and how exactly to use, combine and tweak those tools is up to the encoder.

And over time new/better combinations of these tools are found for specific scenarios. That's how different encoders of the same codec can produce very different output.

As a simple example, almost all video codecs by default describe each frame relative to the previous one (I.e. it describes which parts moved and what new content appeared). There is of course also the option to send a completely new frame, which usually takes up more space. But when one scene cuts to another, then sending a new frame can be much better. A "bad" codec might not have "new scene" detection and still try to "explain the difference" to the previous scene, which can easily take up more space than just sending the entire new frame.

[–] [email protected] 1 points 10 months ago

the “standard” basically only strictly explains how to decode the data and any encoder that produces output that successfully decodes that way is fine

Ah, okay, this explains the whole aspect of it then, for me. :-) If this is how a certain format is described, then it makes sense that encoders can produce different data, which then will be decoded as different output as well, all while all parties are compliant with the specification. That makes much more sense. Thanks for taking the time to explain everything, including I-frames and P-frames! ;-)

[–] [email protected] 2 points 10 months ago (1 children)

Doesn't libsvtav1 do the same on all platforms since it's CPU-based? At least that's the exact encoder OP specified

[–] [email protected] 2 points 10 months ago

Yes, yes it will. (Well, at least it should. If it doesn't, that's a bug.)

The problem here is that the premise of this post is evaluating buying a GPU with AV1 encoder in order to transcode a media library. Any GPU-based AV1 encoder will produce very different results than svt-av1, likely much worse results that is.

[–] [email protected] 1 points 10 months ago

It's svt-av1, as can be seen from the ffmpeg command in the article.

[–] [email protected] 6 points 10 months ago

Thanks for posting. I'm still new to this and had no idea what settings I should be using.

[–] [email protected] 5 points 10 months ago (1 children)

Can you explain what you mean by "visually lossless"? Is this a purely subjective classification, or is there a specific definition or benchmark you used?

[–] [email protected] 8 points 10 months ago* (last edited 10 months ago)

Visually lossless means I couldn't tell an image difference even when pixel peeping with imgsli. Good enough means I couldn't tell a difference in video, but could occasionally see a compression artifact in imgsli.

The VMAF results are purely objective measurements. You can read more about it here; https://en.wikipedia.org/wiki/Video_Multimethod_Assessment_Fusion

[–] [email protected] 1 points 10 months ago

I've also gone down that rabbit hole and found Vivictpp pretty good. It allows you to play two videos so you can swipe between them like imgsli you mentioned.

There's a whole range measurements trying to approximate quality differences between a video source and encode. PSNR, SSIM, VMAF, MS-SSIM
All of them with some strong areas and tricks you can use to cheat them.