this post was submitted on 27 Nov 2024
566 points (99.0% liked)

Memes

45887 readers
1306 users here now

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

founded 5 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 39 points 3 weeks ago* (last edited 3 weeks ago) (2 children)

At this point, any request for information could potentially be used as training data. That includes things like captchas.

I recommend everyone have an extremely literal interpretation of "labor". Unless you have tremendous insight into where your data is going and how it is being used (and perhaps even then), then assume any ask is ultimately an ask for unpaid labor.

Obviously you can't avoid things like captchas, but you can avoid things like this.

Edit: and it should go without saying, but anything you upload to socials is probably automatic training data at this point. The best approach is simply not to engage with corporate social networks.

Though Lemmy is not corporately controlled, the information is publically accessible, so even this post is potential training data to be scraped. That is harder to avoid, lest we stop using the internet altogether, but at least avoiding the corpo routes is a good start.

[–] [email protected] 13 points 3 weeks ago

Captchas have been for training ai for years that's nothing new. Iirc the reason you do two is one to confirm you're human, one for training data

[–] [email protected] 3 points 3 weeks ago

Bear in mind, with this liberal interpretation, any time you access a website, that is also consuming someone's labor and if you don't have a subscription to it, it is unpaid.