this post was submitted on 28 Dec 2023
328 points (97.4% liked)
Technology
59010 readers
4872 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Not the original comment but I think the difference you're looking for is in the copying and distribution. The OC makes the false assumption that the data set is full copies of every object fed into it rather than sets of common characteristics.
For example, my own mind has a concept tree. Tree is not a copy of every tree I've ever known but more like lists of common characteristics that define treeness based on information I've gathered about treeness (my data set).
Piracy is piracy not because of how it's consumed, but rather, how it's distributed and stored, as full copies of the object. Datasets are not copies, in other words. And thus copyright doesn't apply.
Reading an article to get an idea about what articleness is, is fair use. Reading an article to reproduce it verbatim is not. And as of now, I don't believe LLMs are doing the later.