Privacy

31991 readers

537 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Posting a link to a website containing tracking isn't great, if contents of the website are behind a paywall maybe copy them into the post
Don't promote proprietary software
Try to keep things on topic
If you have a question, please try searching for previous discussions, maybe it has already been answered
Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
Be nice :)

Related communities

Chat rooms

[Matrix/Element]Dead
Discord

much thanks to @gary_host_laptop for the logo design :)

founded 5 years ago

MODERATORS

[email protected]

124

Proton just joined the AI clown car show (proton.me)

submitted 4 months ago by [email protected] to c/[email protected]

55 comments fedilink hide all child comments

Fuck this shit, why does every fucking thing need an LLM?

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 7 points 4 months ago (1 children)

I know at least with art, AI is starting to eat itself with the massive output of content. AI is getting trained on more and more AI content and according to what I read at least its starting to affect new outputs.

Assuming thats true, it at least makes techie sense to me lol, I expect the same would happen to text based AI as well as more and more of the internet becomes exclusively AI generated.

[–] [email protected] 0 points 4 months ago (1 children)

The term "model collapse" gets brought up frequently to describe this, but it's commonly very misunderstood. There actually isn't a fundamental problem with training an AI on data that includes other AI outputs, as long as the training data is well curated to maintain its quality. That needs to be done with non-AI-generated training data already anyway so it's not really extra effort. The research paper that popularized the term "model collapse" used an unrealistically simplistic approach, it just recycled all of an AI's output into the training set for subsequent generations of AI without any quality control or additional training data mixed in.

[–] [email protected] 1 points 4 months ago (1 children)

"Well curated"

Say these claims are overhyped. Wouldn't we still reach a point where it's true, without having humans have to sit down and sift through what's allowed and what isn't?

[–] [email protected] 2 points 4 months ago

Not necessarily. Curation can also be done by AIs, at least in part.

As a concrete example, NVIDIA's Nemotron-4 is a system specifically intended for generating "synthetic" training data for other LLMs. It consists of two separate LLMs; Nemotron-4 Instruct, which generates text, and Nemotron-4 Reward, which evaluates the outputs of Instruct to determine whether they're good to train on.

Humans can still be in that loop, but they don't necessarily have to be. And the AI can help them in that role so that it's not necessarily a huge task.