overview for Zima

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 1 points 10 months ago (1 children)

Lol. You already forgot you claimed that they need to retain the training data first.

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 1 points 10 months ago (1 children)

Lol. You already forgot you claimed that they need to retain the training data first.

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 1 points 10 months ago (5 children)

You still haven’t backed up your claim. Once again just because you don’t know it doesn’t mean it’s not possible to do something.

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 1 points 10 months ago (7 children)

Ok i believe that you believe that. It’s ok. I have professional experience in this space so you’re either not reading carefully or you don’t understand much about the topic.

Perhaps you might want to reconsider this in more abstract terms. The engine example you ignored could help you with that.

Do you really think that the fact that we have language models that don’t memorize and are simple enough that we can know for certain is not all we need to show that language models don’t necessarily have to memorize? You keep repeating the same (illogical) argument and ignore the simpler arguments that disprove your claim.

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 1 points 10 months ago

?? Are you trolling. If you design a car to combust gasoline without burning the lubricants but you still end up burning them it doesn’t mean that the lubricants are needed for the combustion itself. Conversely you have not made any nuanced argument explaining why memorization is necessary. I gave you an example where we know there is no memorization and you ignored it.

“Otherwise how would it create the words” is just saying you wouldn’t know.

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 1 points 10 months ago (9 children)

?? Are you trolling. If you design a car to combust gasoline without burning the lubricants but you still end up burning them it doesn’t mean that the lubricants are needed for the combustion itself. Conversely you have not made any nuanced argument explaining why memorization is necessary. I gave you an example where we know there is no memorization and you ignored it.

“Otherwise how would it create the words” is just saying you wouldn’t know.

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 1 points 10 months ago (12 children)

You would probably claim I don’t deserve my job with my level of technical illiteracy however you think you are inferring that . Anyways they do make reasonable efforts to design models that don’t memorize and are able to generalize. This is quite basic or fundamental on machine learning in general.

Previous models had semantic reasoning capacidad without memorization e.g. word2vec.

You should also realize that just because current models are memorizing despite efforts to prevent it doesn’t mean that models need to memorize. Like i said initially they are actually designed to work without needing to memorize.

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 4 points 10 months ago* (last edited 10 months ago)

that's the theory. previous models also were supposed to be doing 3 digit math but they dicovered that the questions were in the training data.

so you should look into what happens when people ask chat gpt to repeat a word forever, it prints the word for a while and then prints training data, check this link https://www.404media.co/google-researchers-attack-convinces-chatgpt-to-reveal-its-training-data/

edit: relevant part:

It also, crucially, shows that ChatGPT’s “alignment techniques do not eliminate memorization,” meaning that it sometimes spits out training data verbatim. This included PII, entire poems, “cryptographically-random identifiers” like Bitcoin addresses, passages from copyrighted scientific research papers, website addresses, and much more.

“In total, 16.9 percent of generations we tested contained memorized PII,”

I should also reiterate that I agree that the intent is to avoid memorization, but they are not successful yet.

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 2 points 10 months ago (14 children)

The model has to contain the data in order to produce works.
as far as I understand, this isn't true. can you elaborate on why it needs to contain the data?

The New York Times is suing OpenAI and Microsoft for copyright infringement in c/[email protected]

[–] [email protected] 5 points 10 months ago* (last edited 10 months ago) (2 children)

the poem poem poem thing shows that the llms actually do memorize at least some training data. chatgpt changed their eula to forbid users from asking it to repeat words forever after this was in the news.

also as far as I understand there are usually fair use and non profit exceptions for use of training data but they generally limit how it can be used. so training a model for commercial purposes might be against the license of the training data.

I don't necessarily agree with the nyt but they seem to be framing this as someone aggregating their data and packeting it in a better way so they are hurting their profits. i don't really see that as necessarily being true. they could argue the same about google news showing their news...

What do you like about socialism? in c/[email protected]

[–] [email protected] -1 points 10 months ago* (last edited 10 months ago) (1 children)

at least the way a socialist teacher taught me in primary school (and i don't completely agree with him but it's a good charactherization) you have desirable values of freedom and equality and they are in conflict. again I don't necesarrily agree with that and it boils down to the fact that when equality is implemented is always by averaging down everyone which is at the expense of freedom. anyways so supposedly you have capitalism as a system that places freedom above equality and communism as a system that places equality above freedom. so it's not really about good and bad but a conflict of virtues.

it's completely besides the point but i do rank freedom slightly above equality. in reality i would like to ensure some minimum level of support for everyone , i think that should be a pretty low level of support. just the bare minimun e.g. ensured education and equal chance at success at life, and health care depending on the actual amount of resources that can be allocated to it, nothing unrealistic but just the minimum to live without suffering, including other stuff like food and clothing and shelther as well. and then to have the freedom so that if anyone wants more than the minimum they should work for it. I'm sure that the people that wanted to work would be able to produce enough value to provide that minimum life support for everyone. so about 80% freedom and 20% equality.

What do you like about socialism? in c/[email protected]

[–] [email protected] 0 points 10 months ago

I don’t think it’s an inevitable result. As we get better at handling complex systems we might have a chance at a more efficient planned economy (that would still have issues but so does capitalism) we are not there yet. I do think that capitalism is the best system we can currently use. It doesn’t mean it’s not flawed

I don’t think that my opinion is about how relevant I think the economic calculation problem is or worker productivity. It’s the famines that follow its implementation that i find “relevant”.