this post was submitted on 21 Mar 2024
97 points (89.4% liked)

Technology

59398 readers
2735 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 1 points 8 months ago (1 children)

Is it being poisoned because the generated data is garbage or because the generated data is made by an AI?

Using a small model let's it be shown faster but also means the outputs are seriously terrible. It's common to fine tune models on gpt4 outputs which directly goes against this.

And there is a correlation between size and performance. It's not a rule per say and people are working hard on squeezing more and more out of small models, but it's not a fallacy to assume bigger is better.

[โ€“] [email protected] 4 points 8 months ago* (last edited 8 months ago)

I think it's also worth keeping in mind that some people use AI to generate "real sounding" content for clicks, or for scams, rather than making actual decent content. I'd argue humans making shitty content is going to be on a much worse scale as AI helps automate it. The other thing is I worry AI can't as easily tell human or AI made bullshit from decent content. I may know the top 2 google results are AI gen clickbait, but whatever is scraping content en masse may not bother to differentiate. So it might become an exponential issue.