this post was submitted on 23 Feb 2024
610 points (96.8% liked)
Technology
59030 readers
3070 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Question: Wouldn't Lemmy instances easy be able to this without many users knowing?
And would they also be able to sell data from other instances, because they can load data from federated instances?
Basically yes, but unlike Reddit which has control over its proprietary network, Lemmy instances would have a hard time locking down access to create artificial scarcity for their data without causing other problems.
Technically? Probably, yes. Legally? I don't think so (never looked into it)
Why do you believe they wouldn't legally be able to?
It's the whole copyright question. Users own the copyright on their own posts, and it's the terms of service that are supposed to say what the server and other federated servers are allowed or not allowed to do with them. I don't even remember if there were terms of service when I joined Lemmy... But assuming there were, and they didn't explicitly say whether it or federated servers can use user content to train AI, then it becomes a legal question that can only be determined by courts.
Note that this determination will only apply in the country/state where that court is.
IANAL
And why would anyone believe they'd stop if it wasn't legal.
Nobody does
I don't have a problem with anyone scraping what's already public, I just don't want anyone to profit off just selling the data I made for them. OpenAI is at least creating useful stuff. All Reddit ever did was be the middleman.
Reddit has a ton more content though.
Lemmy just has a lot of vintage memes
Agreed.
But it's just a little hypocritical to not use reddit because of this, if it turns out it's much worse to use Lemmy in this regard.
Yeah. Guess some AI companies may have set up an instance already. They won't even have a rate limit or anything on their own instances.