this post was submitted on 29 Jan 2024
1016 points (99.1% liked)

Technology

59322 readers
5220 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 2 points 9 months ago (1 children)

the training data is just a statistical record of human bias.

It's not. It's a record of online conversations, which tend to be more polarized and extreme than real people.

[โ€“] [email protected] 1 points 9 months ago

That's why I said

So as long as the training data is well selected for your problem...

It's clear that in the training data for LLMs, 4chan, reddit, etc. are over-represented, so that explains why chatgpt might be more awful than an average person. Having an LLM decide on, e.g., college admission would be like having a Twitter poll to decide on who should be its next CEO. Like that's obviously stupid, nobody would ever do that, right?

The problem is that for the college admission example, the models were trained on previous admissions, taken by college employees , and these models are still biased.