this post was submitted on 07 Apr 2024
339 points (93.1% liked)

Technology

59152 readers
2310 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 7 months ago

After reading this article that got posted on Lemmy a few days ago, I honestly think we're approaching the soft cap for how good LLMs can get. Improving on the current state of the art would require feeding it more data, but that's not really feasible. We've already scraped pretty much the entire internet to get to where we are now, and it's nigh-impossible to manually curate a higher-quality dataset because of the sheer scale of the task involved.

We also can't ask AI to curate its own dataset, because that runs into model collapse issues. Even if we don't have AI explicitly curate its own dataset, it's highly likely going to be a problem in the near future with the tide of AI-generated spam. I have a feeling that companies like Reddit signing licensing deals with AI companies are going to find that they mostly want data from 2022 and earlier, similar to manufacturers looking for low-background steel to make particle detectors.

We also can't just throw more processing power at it because current LLMs are already nearly cost-prohibitive in terms of processing power per query (it's just being masked by VC money subsidizing the cost). Even if cost wasn't an issue, we're also starting to approach hard limits in physics like waste heat in terms of how much faster we can run current technology.

So we already have a pretty good idea what the answer to "how good AI will get" is, and it's "not very." At best, it'll get a little more efficient with AI-specific chips, and some specially-trained models may provide some decent results. But as it stands, pretty much any organization that tries to use AI in any public-facing role (including merely using AI to write code that is exposed to the public) is just asking for bad publicity when the AI inevitably makes a glaringly obvious error. It's marginally better than the old memes about "I trained an AI on X episodes of this show and asked it to make a script," but not by much.

As it stands, I only see two outcomes: 1) OpenAI manages to come up with a breakthrough--something game-changing, like a technique that drastically increases the efficiency of current models so they can be run cheaply, or something entirely new that could feasibly be called AGI, 2) The AI companies hit a brick wall, and the flow of VC money gradually slows down, forcing the companies to raise prices and cut costs, resulting in a product that's even worse-performing and more expensive than what we have today. In the second case, the AI bubble will likely pop, and most people will abandon AI in general--the only people still using it at large will be the ones trying to push disinfo (either in politics or in Google rankings) along with the odd person playing with image generation.

In the meantime, what I'm most worried for are the people working for idiot CEOs who buy into the hype, but most of all I'm worried for artists doing professional graphic design or video production--they're going to have their lunch eaten by Stable Diffusion and Midjourney taking all the bread-and-butter logo design jobs that many artists rely on for their living. But hey, they can always do furry porn instead, I've heard that pays well~