this post was submitted on 06 Feb 2025

751 points (99.6% liked)

Technology

61963 readers

4400 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

751

'Meta Torrented over 81 TB of Data Through Anna's Archive, Despite Few Seeders' (torrentfreak.com)

submitted 3 days ago* (last edited 3 days ago) by [email protected] to c/[email protected]

106 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 46 points 2 days ago (4 children)

Oh look, another tech giant treating open knowledge initiatives like their personal data buffet. Let me translate this corporate nonsense for you:

Meta: "We need training data for our AI!" Also Meta: Let's leech 81.7TB from a community project without contributing anything back.

The absolute audacity of downloading terabytes through torrents while their employees were internally admitting it was "legally problematic". And the best part? They couldn't even be bothered to seed properly - just grab and go, classic corporate behavior.

Remember when companies actually contributed to open source instead of just parasitically consuming it? But no, they'd rather burden volunteer-run projects with massive bandwidth costs while their lawyers probably bill more per hour than these projects' entire monthly budget.

Pro tip Meta: If you're going to pilfer knowledge from the commons, at least seed back properly. Your "move fast and break things" motto isn't supposed to apply to community archives.

[–] [email protected] 3 points 1 day ago

Not seeding is crazy ...

[–] [email protected] 6 points 2 days ago (2 children)

My seedbox is locked and load, please point me to the. Torrent in need. Archive team assemble!

[–] [email protected] 2 points 1 day ago

Yes please support annas-archive!! It is a wonderful project. I can essentially get an epub file for any book (including banned books) I want. They have so much more than that too.

[–] [email protected] 6 points 2 days ago (1 children)

https://annas-archive.org/

This is the website listed in the article

[–] [email protected] 1 points 2 days ago

An alternate domain https://annas-archive.li/

load more comments (2 replies)

[–] [email protected] 62 points 3 days ago

Just gotta love these big tech companies and their bullshit double standards.

[–] [email protected] 304 points 3 days ago (5 children)

Do it, Judge. Protect the wealthy and say it's not piracy. Do it.

[–] [email protected] 161 points 3 days ago (1 children)

It's not piracy. For corporations. For you and me believe it or not, straight to jail!

[–] [email protected] 36 points 3 days ago (1 children)

Just make an llc, now its legal again.

[–] [email protected] 25 points 3 days ago* (last edited 3 days ago) (1 children)

I'd almost like to think an LLC would be enough, but I suspect that only works if you also have a billion in VC funding and political connections.

[–] [email protected] 25 points 3 days ago

Oh for sure, since the law is basically toilet paper for billionaires at this point.

[–] [email protected] 41 points 3 days ago

Please! Think of the shareholders, we must protect them!

load more comments (3 replies)

[–] [email protected] 206 points 3 days ago (1 children)

[–] [email protected] 29 points 3 days ago

Rules for thee, not for me

[–] [email protected] 158 points 3 days ago (1 children)

Anna's Archive: Mirror our database, help us preserve Humanity's knowledge

Facebook: I'll just torrent what I need, see yaa

These big tech monopolies are a curse to humanity..

[–] [email protected] 55 points 3 days ago (1 children)

Facebook: I’ll just ~~torrent what I need~~ burden your underfunded project and volunteers with over 81 TB of bandwidth costs without contributing anything in return, see yaa

FTFY

[–] [email protected] 14 points 3 days ago (1 children)

Yeah the least they could do is seed forever.

load more comments (1 replies)

[–] [email protected] 164 points 3 days ago (7 children)

But did they keep a good ratio though?

[–] [email protected] 142 points 3 days ago (2 children)

1000% guarantee those mf's had their upload choked to 20kbps

[–] [email protected] 73 points 3 days ago* (last edited 3 days ago) (1 children)

Nah they used a leeching client. No upload at all.

[–] [email protected] 19 points 3 days ago (1 children)

Gotta have some upload just for the protocol traffic tho.

[–] [email protected] 40 points 3 days ago

I would assume that the requests sent from the torrent client to download data are not factored into the Upload amount for the torrent. When they mean no upload, it would be that none of the data in the files they downloaded were shared with anyone else, making them a piece of shit leecher.

load more comments (1 replies)

[–] [email protected] 20 points 3 days ago

Asking the real questions.

load more comments (5 replies)

[–] [email protected] 87 points 3 days ago (3 children)

https://phys.org/news/2010-11-million-dollar-verdict-music-piracy-case.html

In all fairness, meta should be assessed a fee of 250k per EACH pirated work.

This would amount to forfeiting all assets to doge.

[–] [email protected] 11 points 3 days ago (2 children)

They might end up having to pay more money than exists on the planet at that rate.

[–] [email protected] 9 points 2 days ago* (last edited 2 days ago) (1 children)

Good

Edit - See Gary Bowser

[–] [email protected] 3 points 2 days ago (1 children)

https://www.theguardian.com/games/2024/feb/01/the-man-who-owes-nintendo-14m-gary-bowser-and-gamings-most-infamous-piracy-case

[–] [email protected] 3 points 2 days ago

Yes, that one.

load more comments (1 replies)

[–] [email protected] 6 points 2 days ago (1 children)

Assuming 2.6 MB per book.

81 TB would be 32,667,175 books.

At $250k per book that would come out to:

$8.17 trillion.

[–] [email protected] 11 points 2 days ago

[–] [email protected] 0 points 1 day ago (1 children)

And I'd guess all that money would then go to military funding, with Anna's Archive, again getting nothing out of it?

[–] [email protected] 1 points 1 day ago

It would go to... Uh...

HEY SOMEONE PUT A DEAD CAT ON THE TABLE!

[–] [email protected] 117 points 3 days ago (3 children)

“Meta downloaded millions of pirated books from LibGen through the bit torrent protocol using a platform called LibTorrent. Internally, Meta acknowledged that using this protocol was legally problematic,” the third amended complaint noted.

Just want to make clear that Libtorrent is just the torrent application they were using, while the Libgen torrents are easily accessible on the libgen site, not through a separate "platform" called Libtorrent.

I wish people like us could help with these complaints, because then they might actually get the details more accurate to reality.

https://libgen.is/repository_torrent/

https://www.libtorrent.org/

The amended complaint makes it sound like Libtorrent is a private tracker website when its just the application they were using on the publicly available torrents.

load more comments (3 replies)

[–] [email protected] 48 points 3 days ago (2 children)

If someone was to acquire a few hundred gigs of books and feed them to something like paperless-ngx, would it work as a sort of google of books? Are there any software projects better suited for doing thisand understand synonyms and perhaps some context? I guess AI search but guided for the intermediate user.

Google is so bad lately. Basically every result is official sponsored corporate biased BS. It would be nice to be able to instantly query a bunch of ebooks.

[–] [email protected] 8 points 2 days ago

GPT, Meta, Deepseek and Google have probably all been trained on the data.

The problem is, training on the data, and actually training for knowledge of the data are VERY different things.

https://www.youtube.com/watch?v=_GkHZQYFOGM

load more comments (1 replies)

[–] [email protected] 16 points 3 days ago

The Pirates of the Crown

[–] [email protected] 12 points 2 days ago

If buying ain't owning, than downloading...

oh wait, that's our slogan

[–] [email protected] 41 points 3 days ago (6 children)

Given the extent it should be considered criminal so $250k per offense and the higher ups who authorized the torrenting should get conspiracy charges at a minimum.

But this is America so they'll probably pay a small amount, for Meta, and a light slap on the wrist with a finger wagging.

load more comments (6 replies)

[–] [email protected] 32 points 3 days ago (1 children)

What is Anna’s Archive?

[–] [email protected] 46 points 3 days ago (7 children)

It’s a popular search engine that works with shadow libraries like Sci-Hub or Library Genesis. Shadow libraries are hosts to copies of works of literature and science. Their legal status is murky at best but it’s incredibly impractical to persecute those accessing them.

load more comments (7 replies)

[–] [email protected] 22 points 3 days ago (5 children)

Meta has open sourced every single one of their llms. They essentially gave birth to the whole open llm scene.

If they start losing all these lawsuits, the whole scene dies and all those nifty models and their fine-tunes get removed from huggingface, to be repackaged and sold to us with a subscription fee. All the other domestic open source players will close down.

The copyright crew aren't the good guys here, even if it's spearheaded by Sarah Silverman and Meta has traditionally played the part of the villain.

[–] [email protected] 32 points 3 days ago (18 children)

Meta stole from everyone, including those that struggle to make ends meet, so it doesn’t matter that they gave you back some of it. Any moral qualms should evaporate when you consider that they did it to create shareholder value and the rest is philanthropy (aka pretend tax). As a socialist I believe that man is owed for his work and you can’t take from him even though technology makes it so easy.

[–] [email protected] 26 points 3 days ago* (last edited 3 days ago) (8 children)

As a socialist I believe intellectual property is a falsehood and technological advancement should be for the public good. Open source LLMs are for the public good.

Given the options between having open source LLMs and the US Govt banning non-corpo non-proprietary LLMs and giving a free pass to people like Musk and Altman and Zucc to monopolize, I happily pick the former.

You're delusional if you think they will pay anyone, the only way zucc will pay is with a guillotine.

Corpos will make inter-platform deals that'll simply make all online data licensable for the right price and enrich each other so you can't avoid it while still actually being a career creative, but price out academic researchers and the public sector so that all fruits of it stay behind closed R&D doors and be free of ethics etc.

Continuing in your role as a useful idiot, you'll also most likely also foot the bill for it via subsidies from your taxes to "develop the AI sector" in some anti-China dick measuring contest by the US.

You will then be sold this data back via proprietary chat bots via a monthly subscription and you better pay up because once it gets really good, it will become mandatory to use for just about any job, leaving you with no choice.

Or you can support FOSS LLMs.

load more comments (8 replies)

load more comments (17 replies)

[–] [email protected] 15 points 3 days ago (4 children)

If the existence of open source LLMs hinges on the benevolence of one of the few most cancerous tech companies in the world, maybe they're not really worth it?

This isn't about "heroes" and "villains". Facebook has been and has stayed the "villain", they've done something colossally illegal that any mere mortal would be sued to death for (by an another "villainous" instance, the media system that has made piracy a necessity in the first place), and they're hoping to get away with it simply on technicalities and by having more money for better lawyers. Rules are rules, if you don't like them maybe Facebook should try to change them (and not just for themselves, but for the rest of us too)?