this post was submitted on 31 May 2025
84 points (78.4% liked)

Linux

54663 readers
1329 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 6 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 7 points 3 days ago (1 children)

Yes, but:

To get to this point, OpenAI had to suck up almost all data ever generated in the world. So in order for it to become better, lets say it has to have 3 times as much data. That alone would take more than 3 Lifetimes to get the data alone, IF we don´t consider the AI slop and assume that all data is still Human made, which is just not true.

In other words: What you describe will just about never happen anymore, at least as long as 2025 will still be remembered

[–] [email protected] 3 points 3 days ago

Yes, true, but that is assuming:

  1. Any potential future improvement solely comes from ingesting more useful data.
  2. That the amount of data produced is not ever increasing (even excluding AI slop).
  3. No (new) techniques that makes it more efficient in terms of data required to train are published or engineered.
  4. No (new) techniques that improve reliability are used, e.g. by specializing it for code auditing specifically.

What the author of the blogpost has shown is that it can find useful issues even now. If you apply this to a codebase, have a human categorize issues by real / fake, and train the thing to make it more likely to generate real issues and less likely to generate false positives, it could still be improved specifically for this application. That does not require nearly as much data as general improvements.

While I agree that improvements are not a given, I wouldn't assume that it could never happen anymore. Despite these companies effectively exhausting all of the text on the internet, currently improvements are still being made left-right-and-center. If the many billions they are spending improve these models such that we have a fancy new tool for ensuring our software is more safe and secure: great! If it ends up being an endless money pit, and nothing ever comes from it, oh well. I'll just wait-and-see which of the two will be the case.