this post was submitted on 29 Apr 2024
195 points (94.9% liked)

Technology

59322 readers
4286 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 12 points 6 months ago (7 children)

No surprise, and this is going to happen to everybody who uses neural net models for production. You just don't know where your data is, and therefore it is unbelievably hard to change data.

So, if you have legal obligations to know it, or to delete some data, then you are deep in the mud.

[–] [email protected] 1 points 6 months ago (1 children)

I think of ChatGPT as a "text generator", similar to how Dall-E is an "image generator".
If I were openai, I would post a fictitious person disclaimer at the bottom of the page and hold the user responsible for what the model does. Nobody holds Adobe responsible when someone uses Photoshop.

[–] [email protected] 3 points 6 months ago (2 children)

I would post a fictitious person disclaimer

... or you could read the GDPR and learn that such excuses are void.

[–] [email protected] 0 points 6 months ago (1 children)

LLMs don't actually store any of their training data, though. And any data being held in context is easily accessible and can be wiped or edited to remove personal data as necessary.

[–] [email protected] 4 points 6 months ago (1 children)

LLMs don't actually store any of their training data,

Data protection law covers all kinds of data processing.

For example, input is processing, too. Output is processing, too. Section 4 of the GDPR.

If you really want to rely on excuses, you would need wayyy better ones.

[–] [email protected] 0 points 6 months ago

Right, so keep personal data out of the training set and use it only in the easily readable and editable context. It'll still "hallucinate" details about people if you ask it for details about people, but those people are fictitious.

[–] [email protected] 0 points 6 months ago (1 children)

You just wasted a lot of my time. What did I do to deserve this?

[–] [email protected] 1 points 6 months ago

... said the sparrow and flew out of the library.

load more comments (5 replies)