this post was submitted on 25 Feb 2024
217 points (100.0% liked)

Fediverse

28063 readers
373 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to [email protected]!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 1 year ago
MODERATORS
 

cross-posted from: https://lemmy.dbzer0.com/post/15112791

Hey y'all. I've been working on this little project ever since the recent spam wave started. This is a very basic Python automoderator bot which will monitor the comments and posts federated into your instance for specific regex instances and then automatically report, delete, ban etc.

The Bot setup is very simple, as you can just chuck its docker-compose entry into your existing lemmy one. You just need to fill in the relevant environment variables.

The bot works by constantly polling your incoming reports, posts and comments, and matching them against provided regex.

I wanted to keep things simple for admins, so the bot configuration happens via a simple PM syntax. The README goes into details on this. But you basically send a message like this to the Bot to add a new filter

threativore add comment filter: `trial period`
reason: `Spam comment`
action: `REMOVE`
description: `Known spam string`

All bot controls work the same way. Eventually I want to add a UI to it.

The bot is built with collaboration in mind. So you can add more people to help you maintain your filters (even if they're not admins), you can add users whose reports will be treated more seriously, and you can even mark users as "ham" (i.e. known not spammers) to prevent them ever being filtered.

This is just the very first release and I have a lot of ideas to improve it in the future. Here's some stuff in my roadmap which should make the threativore a much more collaborative/crowdsourced process between multiple instance admins and the larger userbase. Stay tuned.

PRs and suggestion are welcome.

PS: The bot is already active on https://lemmy.dbzer0.com, so you can check the modlog for its actions.

top 12 comments
sorted by: hot top controversial new old
[–] [email protected] 26 points 7 months ago

Great work, thank you for making the effort!

[–] [email protected] 12 points 7 months ago

Great, thank you for this!

[–] [email protected] 12 points 7 months ago (1 children)

I'd really like a spam-signup type detector. Like if someone signs up and immediately starts posting or commenting way too much they should be given a few days ban, and if done again - permanently.

[–] [email protected] 10 points 7 months ago (1 children)

That could affect power user who just moved instances

[–] [email protected] 3 points 7 months ago

That's true but there's still a difference between power user and spambot.

[–] [email protected] 9 points 7 months ago (2 children)

Another more general property that might be worth looking for would be substantially similar posts that get cross-posted to a wide variety of communities in a short period of time. That's a pattern that can have legitimate reasons but it's probably worth raising a flag to draw extra scrutiny.

One idea for making it computationally lightweight but also robust against bots "tweaking" the wording of each post might be to fingerprint each post based on rare word usage. Spam is likely to mention the brand name of whatever product it's hawking, which is probably not going to be a commonly used word. So if a bunch of posts come along that all use the same rare words all at once, that's suspicious. I could also easily see situations where this gives false positives, of course - if some product suddenly does something newsworthy you could see a spew of legitimate posts about it in a variety of communities. But no automated spam checker is perfect.

[–] [email protected] 11 points 7 months ago* (last edited 7 months ago) (1 children)

Feel free to submit a PR for these ideas. For post similarity, ML learning techniques can be used to calculate the "distance" between two posts, but I don't know if with an increasing amount of spam could work computation wise. Especially if spammers start using their own GenerativeAI engines.

[–] [email protected] 3 points 7 months ago (1 children)

That's why I was suggesting such a simple approach, it doesn't require AI or machine learning except in the most basic sense. If you want to try applying fancier stuff you could use those basic word-based filters as a first pass to reduce the cost.

[–] [email protected] 5 points 7 months ago

There's likely a lot of anti spam tactics we can employ. I hope people will help improve it

[–] [email protected] 1 points 7 months ago

Honestly, my dream lemmy client would combine posts in my home and all feed based solely on the links in the post regardless of community or instance, and it would then provide UX to present the rest of the information if I choose to click into it.

Lemmy is designed around a concept that almost requires but definitely invites spamming links. Assuming you have good intentions and want to reach a wider federated audience, you would post your link to a few instances at once.

[–] [email protected] 4 points 7 months ago (1 children)

This is sort of related but do you have any plans on looking for coordinated voting?

[–] [email protected] 4 points 7 months ago

Not atm. Wouldn't even begin to know where to look.