this post was submitted on 27 Dec 2023
14 points (100.0% liked)
Technology
3 readers
1 users here now
This magazine is dedicated to discussions on the latest developments, trends, and innovations in the world of technology. Whether you are a tech enthusiast, a developer, or simply curious about the latest gadgets and software, this is the place for you. Here you can share your knowledge, ask questions, and engage in discussions on topics such as artificial intelligence, robotics, cloud computing, cybersecurity, and more. From the impact of technology on society to the ethical considerations of new technologies, this category covers a wide range of topics related to technology. Join the conversation and let's explore the ever-evolving world of technology together!
founded 2 years ago
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
No it doesn't, the training data isn't inside the LLM.
So firstly, even if those claims are true, you sue the wrong business, you would need to sue the training data maker. They however are usually protected by laws for science, because they are "non profit research"
Therefore this is completely ridiculous.
Btw, A the copyright part is only a thing if its a significant portion of the thing... Wich it clearly isn't in this case (its below 1% of it) making it even more ridiculous.
Also, if you can get the information on the internet, you are again suing the wrong place, you should be after the provider, not the automatic data grabbing system... As they can and will argue that they cant control what their algorithm crawler takes. There is a way to mark content as "dont use" for Mashines, but most people don't do that and will lose in court because they don't understand it...
Lastly, the training wouldn't be harder, the problem is the gathering of data. You can't manually look through all of it and its idiotic to think that its reasonable to demand such a thing.
This is factually incorrect. You can extract the data. How do you think the legal cases are being brought?
For example
The model has to contain the data in order to produce works.
Wholesale commercial copyright infringement where you're profiting off of others work on a large scale is a whole different ball game.
They're training their models on large amounts of pirated content and profiting off it.
Of course the rights holders are going to say "wait a minute, why are you making money off my content without my permission? And how much of my work did you pirate to use?"
You cannot hand wave away mass piracy to train their models, and then distribute said models based on an act of mass copyright infringement.
Do you not understand the basics of the law?
Again, the law is the law. If they mass pirate a bunch of media which then the model contains chunks of they are breaking the law.
I can't believe this is a hard concept for someone to understand.
This entire comment screams of 0 technical knowledge.
The LLM does not contain the training data. It contains nothing but math it generates you an answer by calculations, in the end you get the awnser wich is statistically most likely what you want. Otherwise the fucking thing wouldn't produce fake news and make shit up.
Shure if you want it to write you a very specific thing and you know exactly what to ask, you might get a small text that is "copyrighted" but thats because you asked for it, not because it's inside. It just gives you the awnser you most likely find helpful, statistically.
Its like asking you to read a page very well and then asking you the next day to write down what was on the page, while giving you lots of hints. You didn't actually copy from it in that case.
Yes, your comment does.
There is literally software to extract this stuff from models now.
This "it's just math" is techbro idiocy. It's like the idiots regurgitating crypto coin bullshit.