this post was submitted on 03 Sep 2024

1578 points (97.8% liked)

Technology

59187 readers

2246 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

1578

OpenAI Pleads That It Can’t Make Money Without Using Copyrighted Materials for Free (futurism.com)

submitted 2 months ago by [email protected] to c/[email protected]

462 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 256 points 2 months ago (3 children)

Yeah! I can't make money running my restaurant if I have to pay for the ingredients, so I should be allowed to steal them. How else can I make money??

Alternatively:

OpenAI is no different from pirate streaming sites in this regard (loosely: streaming sites are way more useful to humanity). If OpenAI gets a pass, so should every site that's been shut down for piracy.

[–] [email protected] 111 points 2 months ago (2 children)

If OpenAI wants a pass, then just like how piracy services make content freely open and available, they should make their models open.

Give me the weights, publish your datasets, slap on a permissive license.

If you're not willing to contribute back to society with what you used from it, then you shouldn't exist within society until you do so.

[–] [email protected] 58 points 2 months ago (1 children)

Piracy steals from the rich and gives to the poor. ChatGPT steals from the rich and the poor and keeps for itself.

[–] [email protected] 8 points 2 months ago

and keeps for itself.

Which is why they should be legally compelled to publicize all of their datasets, models, research, and share any profits they've made with the works they can get provenance data for, because otherwise, it's an unfair use of the public sphere of content.

One could very easily argue that adblockers are piracy, and those would be stealing from every social media creator, small blog, and independent news site, but I don't see many people arguing against that, even though that very well includes people who aren't wealthy corporations.

The issue isn't necessarily the use of the copyrighted content, it's the unfair legal stance taken on who can use the content, and how they are allowed to profit (or not profit) from it.

I'm not saying there are no downsides, but I do feel like a simple black and white dichotomy doesn't properly outline how piracy and generative AI training are relatively similar in terms of who they steal from, and it's more of a matter of what is done with the content after it is taken that truly matters most.

[–] [email protected] 19 points 2 months ago (2 children)

No they shouldn’t. They should cease to exist

[–] [email protected] 12 points 2 months ago (2 children)

Good luck putting the cat back in the bag.

[–] [email protected] 7 points 2 months ago

I have cats. Putting them back in a bag or box is easier

[–] [email protected] 2 points 2 months ago (1 children)

Well if everyone who’s copyrighted work independently sues OpenAI, that cat will be deceased real quick due to bankruptcy

[–] [email protected] 3 points 2 months ago

Fuck copyright they used gplv3 code why isnt it open source

[–] [email protected] 3 points 2 months ago (3 children)

Generative AI is not going back into the bag. If not OpenAI, then someone else will control it. So we deal with them the next best way, force them to serve us, the people.

[–] [email protected] 26 points 2 months ago (1 children)

Then they can either pay for the copyrighted data they want to train on or lobby for copyright to be reigned in for everyone. Right now, they're acting like entitled twats with a shit business model demanding they get a free pass while the rest of us would be bankrupted for downloading a Metallica MP3.

[–] [email protected] 4 points 2 months ago

I think this better solves the issue.

The problem isn't necessarily the use of copyrighted works, (although it can be a problem in many ways) it's the unfair legal determination of who is allowed to do so.

[–] [email protected] 13 points 2 months ago

Nobody should profit from copyright violation. Yes, copyright law needs to change, but making money isn’t an exception

[–] [email protected] 3 points 2 months ago

Generative AI is not going back into the bag.

It probably will, though, once model collapse sets in.

That's the irony, really... the more successful it is, the sooner it'll poison itself to death.

[–] [email protected] 7 points 2 months ago (1 children)

This is actually a very good comparison because restaurants use this argument all the time, except for wages:

"I can't make money running my restaurant if I have to pay a living wage to my servers, so you should pay them with tips. How else can we stay open?"

These business that can't operate profitably like any other business should fail.

[–] [email protected] 1 points 2 months ago

In China, tipping is considered insulting because you are implying exactly that: that they are incapable of running their business without your donation.

[+] [email protected] -28 points 2 months ago (2 children)

K, so Google should be shut down too?

They can't operate without scraping copyrighted data.

[–] [email protected] 27 points 2 months ago* (last edited 2 months ago)

This is a false equivalency.

Google used to act as a directory for the internet along with other web search services. In court, they argued that the content they scrapped wasn't easily accessible through the searches alone and had statistical proof that the search engine was helping bring people to more websites, not preventing them from going. At the time, they were right. This was the "good" era of Google, a different time period and company entirely.

Since then, Google has parsed even more data, made that data easily available in the google search results pages directly (avoiding link click-throughs), increased the number of services they provide to the degree that they have a conflict of interest on the data they collect and a vested interest in keeping people "on google" and off the other parts of the web, and participated in the same bullshit policies that OpenAI started with their Gemini project. Whatever win they had in the 2000s against book publishers, it could be argued that the rights they were "afforded" back in those days were contingent on them being good-faith participants and not competitors. OpenAI and "summary" models that fail to reference sources with direct links, make hugely inaccurate statements, and generate "infinite content" by mashing together letters in the worlds most complicated markov chain fit in this category.

It turns out, if you're afforded the rights to something on a technicality, it's actually pretty dumb to become brazen and assume that you can push these rights to the breaking point.

[–] [email protected] 17 points 2 months ago (1 children)

Google (and search engines in general) is at least providing a service by indexing and making discoverable the websites they crawl. OpenAI is is just hoovering up the data and providing nothing in return. Socializing the cost, privatizing the profits.

[+] [email protected] -16 points 2 months ago* (last edited 2 months ago) (1 children)

Uh, that's objectively false.

OoenAI also provides ChatGPT as a "free" service, and Google has made billions off of that "free" service they oh so altruistically provide you.

[–] [email protected] 26 points 2 months ago (1 children)

Google points to your content so others can find it.

OpenAI scrapes your content to use to make more content.

[+] [email protected] -23 points 2 months ago (3 children)

That's not a meaningful distinction, I spent all day using a Copilot search engine because the answers I wanted were scattered across a bunch of different documentation sites.

It was both using the AI models to interpret my commands (not generation at all), and then only publishes content to me specifically.

[–] [email protected] 14 points 2 months ago

I’m talking about the training phase of LLMs.that is the portion that is doing the scraping and generation of copy written data.

You using an already trained LLM to do some searches is not the same thing.

[–] [email protected] 13 points 2 months ago (1 children)

Technically it is meaningful, fair use is for specifically things that don't replace the original in function.

[–] [email protected] -3 points 2 months ago (1 children)

Depends on what the function was. If the function was to drive ad revenue to your site, then sure, if the function was to get information into the public, then it's not replacing the function so much as altering and updating it.

[–] [email protected] 4 points 2 months ago (1 children)

If that "altering and updating" means people don't need to read the original anymore, then it's not fair use.

TBH I'm for reigning in copyright substantially, and would be on the shitty text generator company side of this, but only if it makes a precedent and erodes copyright as a whole instead of just creating a carveout if you have a lot of moeny for lawyers.

[–] [email protected] 0 points 2 months ago* (last edited 2 months ago)

I generally agree, but I really think people in this thread are being overly dismissive about how useful LLMs are, just because they're associated with techbros who are often associated with relatively useless stuff like crypto.

I mean most people still can't run an LLM on their local machine, which vastly limits what developers can use them for. No video game or open source software can really include them in any core features because most people can't run them. Give it 3 years when every machine has a dedicated neural chip and devs can start using local LLMs that don't require a cloud connection and Azure credits and you'll start seeing actually interesting and inventive uses of them.

There's still problems with attributing sources of information but I honestly feel like if all LLMs that were trained on copyrighted data had to be published open source so that anyone could use them it would get us enough of the way there that their benefits would outweigh their costs.

[–] [email protected] 10 points 2 months ago (1 children)

It's absolutely a meaningful distinction. Search engines push people to tour website where you can capitalize on your audience however you see fit. LLM's take your content, through them through the mixer and sell it back to people. It's the difference between a movie reviewer explaining a movie and a dude in an alley selling a pirated copy of the movie.

[–] [email protected] -1 points 2 months ago* (last edited 2 months ago)

A) An LLM does not inherently sell you anything. Some companies charge you to run and use their LLMs (OpenAI), and some companies publish their LLMs open source for anyone to use (Meta, Microsoft). With neural chips starting to pop in PCs and phones, pretty soon anyone will be able to run an open source LLM locally on their machine, completely for free.

B) LLMs still rarely regurgitate the exact same original source. This would be more like someone in the back alley putting on their own performance of the movie and morphing it and adjusting it in real time based on your prompts and comments, which is a lot closer to parody and fair use than blatant piracy.