this post was submitted on 30 May 2024

1104 points (98.8% liked)

Funny

9983 readers

1818 users here now

General rules:

Be kind.
All posts must make an attempt to be funny.
Obey the general sh.itjust.works instance rules.
No politics or political figures. There are plenty of other politics communities to choose from.
Don't post anything grotesque or potentially illegal. Examples include pornography, gore, animal cruelty, inappropriate jokes involving kids, etc.

Exceptions may be made at the discretion of the mods.

founded 2 years ago

MODERATORS

[email protected]

1104

It's so over (lemmy.world)

submitted 1 year ago by [email protected] to c/[email protected]

141 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 3 points 1 year ago (1 children)

That's not how GPTs work

[–] [email protected] -2 points 1 year ago (2 children)

That's literally how they work

[–] [email protected] 4 points 1 year ago (1 children)

Man the models can't store verbatim its training data, the amount of data is turned into a model that is hundreds or thousands of times smaller than the original source data. If it was capable of simply recovering everything that it was trained on this would be some magical compression algorithm and that by itself would be extremely impressive.

[–] [email protected] -4 points 1 year ago (1 children)

Congratulations on discovering compression

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago) (2 children)

Oh ok, you want to claim this is compressing the entirety of the internet in a model that isn't even 1 terabyte of data and be unimpressed that is something.

But it isn't compression. It is a mathematical fact that neural networks are universal function approximators, this is undisputed, and analytic functions are continuous so to be an analytical function approximator it must be able to fill in the gaps between discrete data points by itself, which necessarily means spiting out data outside of the input distribution, data it has not seen.

[–] [email protected] 2 points 1 year ago (1 children)

TBF, compression is related to ML. Hence, the Hutter Prize. Thinking of LLMs as lossy compression algorithms is a decent analogy.

[–] [email protected] 0 points 1 year ago

It is a partial analogy, it takes into consideration the outputs which are related to some specific training data and disconsiders the outputs which cannot be directly related to any specific training data.

For example, make up a new meme template and a new joke on the spot, it couldn't have seen it before if you make sure your joke and template are new. If the AI can explain it then compression is a horrendous analogy.

Lossy compression explains outputs being similar but not identical when trying to recover the original data, it doesn't explain brand new content that makes sense standalone. Imagine a lossy audio compression resulting in a brand new song midway through playback, or a lossy image compression resulting in a brand new coherent image being overlayed onto some pixels of the original image. That is not what happens, lossy audio compression results in noise, lossy image compression results in noise, not in coherent unheard songs and unseen images.

[–] [email protected] 2 points 1 year ago (1 children)

They do not store anything verbatim; They instead store the directions in which various words and related concepts relate to one another in some gigantic multidimensional space.

I highly suggest you go learn what they actually do before you continue talking out of your ass about them

[–] [email protected] 0 points 1 year ago (1 children)

If you trained a GPT on a single phrase, all you'd get out of it would be the single phrase.

The mechanism of storage doesn't need to be just the verbatim source material, which is not even close to what I said.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

You said it matches text to its training data, which it does not do.

Your single-phrase statement only works for very short, non-repetitive phrases. As soon as your phrase repeats a token more than a few times, the statistics for the tokens change and could result in nonsensical output that repeats through subsections of the training data.

And even then for that single non-repetitive phrases, the reason you would get that single phrase back is not because it would be "matching on" the phrase. It is because the token weights would effectively encode that the statistical likelihood of the "next token" in the generated output is 100% for a given token when the evaluated token precedes it in the training phrase. Or in other words: Your training data being a single phrase maniplates the statistics so that the most likely output is that single phrase.

However, that is a far cry from simple "matching" against the training data. Which is what you said it does.

[–] [email protected] -2 points 1 year ago (1 children)

If it doesn't use its training data, what's the training data for?

[–] [email protected] 2 points 1 year ago (1 children)

Analysis. It uses it, but not by "matching it". The training data is not included in the final model. No GPT can access its training data at runtime.

Training analyzes the contents of the training data and creates a statistical model representing the likelihoods of various tokens based on a complex series of mathematical transformations that encode various attributes of the tokens making up the training data.

3Blue1Brown has a great series on the actual math behind it, I would highly recommend educating yourself on what GPTs actually do. It's way more interesting than simple matching.

[–] [email protected] -2 points 1 year ago* (last edited 1 year ago) (1 children)

God forbid I use simpler language to describe what it does.

It's pattern matching with extra steps.

[–] [email protected] 0 points 1 year ago (1 children)

Simpler language is fine when it's accurate.

Your simplification is inaccurate and could mislead people into thinking GPTs are just advanced regex matching engines.

They are not. They are closer to autocorrect on steroids.

[–] [email protected] 1 points 1 year ago (1 children)

Autocorrect is fancy pattern matching. GPT is fancier pattern matching.

It's more accurate than "AI," since there's no actual reasoning happening.

[–] [email protected] 0 points 1 year ago (1 children)

I'm gonna stop responding to this asanine thread now before you continue to demean us both with your nonsense.

[–] [email protected] 1 points 1 year ago

Have fun matching patterns for the rest of the day!