this post was submitted on 28 Dec 2023

328 points (97.4% liked)

Technology

59010 readers

4872 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

328

The New York Times sues OpenAI and Microsoft for copyright infringement (edition.cnn.com)

submitted 10 months ago by [email protected] to c/[email protected]

49 comments fedilink hide all child comments

The New York Times sues OpenAI and Microsoft for copyright infringement::The New York Times has sued OpenAI and Microsoft for copyright infringement, alleging that the companies’ artificial intelligence technology illegally copied millions of Times articles to train ChatGPT and other services to provide people with information – technology that now competes with the Times.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 11 points 10 months ago (2 children)

This person seems not to know very much about what they are talking about, despite their confidence in saying it.

It looks like they think the reason AI output can't be copyrighted is because it's been "ruled a derivative work" but that's not the reasoning provided which is that copyright can only protect human creativity, and thus machine output without human involvement can't be copyrighted - with the judge noting the line of what proportion of human contribution is needed is unclear.

The other suits trying to claim the models are derivative works are either yet to be settled or in some cases have been thrown out.

Even in one of the larger suits on whether training is infringement regarding LLMs, the derivative claim has been thrown out:

Chhabria, in his ruling, called this argument “nonsensical,” adding, “There is no way to understand the LLaMA models themselves as a recasting or adaptation of any of the plaintiffs’ books.”

Additionally, Chhabria threw out the plaintiffs’ argument that every LLaMA output was “an infringing derivative” work and “constitutes an act of vicarious copyright infringement”; that LLaMA was in violation of the Digital Millennium Copyright Act; and that LLaMA “unjustly enriched Meta” and “breached a duty of care ‘to act in a reasonable manner towards others’ by copying the plaintiffs’ books to train LLaMA.”

Parts of Sarah Silverman-Led AI Copyright Case Against Meta Dismissed, But Not Core Argument

Social media has really turned into a confirmation bias echo chamber where misinformation can run rampant when people make unsourced broad claims that are successful because they "feel right" even if they aren't.

Perhaps the reason hallucination is such a problem for LLMs is that in the social media data that's a large chunk of their training everyone is so full of shit?

[–] [email protected] 3 points 10 months ago* (last edited 10 months ago)

Perhaps the reason hallucination is such a problem for LLMs is that in the social media data that’s a large chunk of their training everyone is so full of shit?

Heh. I think it simply shows us that the fundamental principle of artificial neural nets, really captures how the brain works.

[–] [email protected] 1 points 10 months ago (1 children)

Social media has really turned into a confirmation bias echo chamber where misinformation can run rampant

Honestly this can be easily overstated in the case of social media relative to anything else humanity does. But and large no one knows anything and is happy talking and speculating as they do. It was true before social media and it will be after.

The fun part is trying to make sense of it all, thus why I said “interesting”.

I personally have thought the copyright dimension one of the more interesting aspects of AI in the short and medium term and have thought so for years. Happy to hear takes and opinions on the issue, especially as I’m not plugged into the space any more.

[–] [email protected] 4 points 10 months ago (1 children)

If you want an interesting take from a source that understands both the tech and legal sides for real and not just pretend, see this from the EFF:

https://www.eff.org/deeplinks/2023/04/how-we-think-about-copyright-and-ai-art-0

It's about the diffusion art models and not LLMs, but most of its points still apply (even the point about a stronger case regarding outputs by plaintiffs whose works can be reproduced, such as in this case).

[–] [email protected] 1 points 10 months ago (1 children)

Cheers!

[–] [email protected] 1 points 10 months ago (1 children)

My immediate reaction to the piece is that insofar as it’s trying to predict the path that the courts will take, the author may be too close to the tech while I can imagine judges readily opting to eschew what they’d feel would be excessive technical details in their reasoning. I’m curious to see how true that is.

For me the essential point, made at the end, is what do creators really want from copyright apart from more money … because any infringement case against AI easily spells oppressive copyright law.

I’m curious to see if a dynamic factor in this is how the courts conceive of what the AI actually is and does. The one byte per work argument may come off as naive for instance and lead a judge construct their own model of what’s happening.

Otherwise, the purposes of this thread and the take I posted from mastodon, I’d say the question of whether AI creates copyrightable works and how the broader industries respond to that and what’s legally required of them stands as fundamental in the medium term.

Now curious to see what legal scholarship is predicting, which in some cases probably a better predictor.

[–] [email protected] 2 points 10 months ago (1 children)

Here's the author's bio:

Kit is a senior staff attorney at EFF, working on free speech, net neutrality, copyright, coders' rights, and other issues that relate to freedom of expression and access to knowledge. She has worked for years to support the rights of political protesters, journalists, remix artists, and technologists to agitate for social change and to express themselves through their stories and ideas. Prior to joining EFF, Kit led the civil liberties and patent practice areas at the Cyberlaw Clinic, part of Harvard's Berkman Center for Internet and Society, and previously Kit worked at the law firm of Wolf, Greenfield & Sacks, litigating patent, trademark, and copyright cases in courts across the country.

Kit holds a J.D. from Harvard Law School and a B.S. in neuroscience from MIT, where she studied brain-computer interfaces and designed cyborgs and artificial bacteria.

The author is well aware of the legal side of things.

[–] [email protected] 2 points 10 months ago (1 children)

Oh I’m sure, and it was a good article to be clear. But “the legal side of things”, especially from a certain perspective, and what the courts (and then the legislature) do with a new-ish issue can be different things.

[–] [email protected] 1 points 10 months ago (2 children)

"Kit worked at the law firm of Wolf, Greenfield & Sacks, litigating patent, trademark, and copyright cases in courts across the country. Kit holds a J.D. from Harvard Law School"

The EFF is primarily a legal group and the post straight up mentions that it is a legal opinion on the topic.

So I'm not really clear what "the legal side of things" is that you mean separate from what a lawyer who has litigated IP cases before and works focused on the intersection of law and tech says about a pending case in a legal opinion.

Do you just mean a different opinion from different lawyers?

[–] [email protected] 1 points 10 months ago (1 children)

Yea. Legal opinions vary and legal scholars can have problems, sometimes massive, with what courts and legislators end up doing. “Legal side of things”, in quotes, was intended to convey a cynicism/critique of the idea, belief or even desire some might have (not saying you) for the law to be “settled” and clear.

[–] [email protected] 2 points 10 months ago (1 children)

Ah, well if you want the Columbia Journalism Review has a good summary of the developments and links to various other opinions, particularly in the following paragraph:

According to a recent analysis by Alex Reisner in The Atlantic, the fair-use argument for AI generally rests on two claims: that generative-AI tools do not replicate the books they’ve been trained on but instead produce new works, and that those new works “do not hurt the commercial market for the originals.” Jason Schultz, the director of the Technology Law and Policy Clinic at New York University, told Reisner that there is a strong argument that OpenAI’s work meets both of these criteria. Elsewhere, Sy Damle, a former general counsel at the US Copyright Office, told a House subcommittee earlier this year that he believes the use of copyrighted work for AI training is categorically fair (though another former counsel from the same agency disagreed). And Mike Masnick of Techdirt has argued that the legality of the original material is irrelevant. If a musician were inspired to create new music after hearing pirated songs, Masnick asks, would that mean that the new songs infringe copyright?

(Most of those opinions are linked)

[–] [email protected] 1 points 10 months ago

Thanks!

[–] [email protected] 1 points 10 months ago (1 children)

I assume they mean that on topics not thoroughly tested in court, a judge can always make up their own mind based on how they personally feel.

[–] [email protected] 1 points 10 months ago

Right, but how would that be reflected in another article ahead of time?