this post was submitted on 22 Dec 2024
1577 points (97.5% liked)

Technology

60073 readers
3595 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
 

It's all made from our data, anyway, so it should be ours to use as we want

(page 4) 50 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 1 day ago
[–] [email protected] 0 points 18 hours ago
[–] [email protected] 3 points 1 day ago (1 children)

Are you threatening me with a good time?

First of all, whether these LLMs are "illegally trained" is still a matter before the courts. When an LLM is trained it doesn't literally copy the training data, so it's unclear whether copyright is even relevant.

Secondly, I don't think that making these models "public domain" would have the negative effects that people angry about AI think it would. When a company is running a closed model internally, like ChatGPT for example, the model is never available for download in the first place. It doesn't matter if it's public domain or not because you can't get a copy of it. When a company releases an open-weight model for public use, on the other hand, they usually encumber them with some sort of license that makes them harder for competitors to monetize or build on. Making those public-domain would greatly increase their utility. It might make future releases less likely, but in the meantime it'll greatly enhance AI development.

[–] [email protected] 2 points 1 day ago (4 children)

The LLM does reproduce copyrighted data though.

[–] [email protected] 2 points 1 day ago

*it can produce data identical to data that has been copyrighted before

load more comments (2 replies)
[–] [email protected] 2 points 1 day ago* (last edited 1 day ago)

Only if they were trained on public material.

[–] [email protected] 1 points 1 day ago

Doesn't seem like this helps out all the writers / artists that the LLM stole from.

[–] [email protected] 0 points 1 day ago
load more comments
view more: ‹ prev next ›