this post was submitted on 29 Nov 2023
22 points (100.0% liked)

Privacy

31683 readers
242 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Related communities

Chat rooms

much thanks to @gary_host_laptop for the logo design :)

founded 5 years ago
MODERATORS
 

If it were possible to run LLMs without a significant investment to GPU prowess, this problem wouldn't be very relevant. However, the bigger FOSS LLMs require a lot of power to run.

Is there any automated technique (scripts, lookups etc) that can warn a user before the content is posted online? I'm asking this specifically for textual content.

Thanks


I didn't mention what I wanted clearly enough, so here goes:

I am looking to scan my own posts/comments for stylometry statistics, for the most part, but PII would be nice. I'll deal with the browser-agent, Cookies, IP etc.

Threat model would likely be to prevent people who might be wanting to link my identity with my online persona. Obviously, the government is excluded since they can just mine the IP from Lemmy mods and get to me. This is someone who is interested in my identity and will use FOSS/some proprietary tools to link my identities


Edit: it seems there are packages available on python and R to parser through text and try to infer identity from stylometric data. I'll have to look into that, but it seems doable at a basic level.

all 6 comments
sorted by: hot top controversial new old
[–] [email protected] 5 points 11 months ago (1 children)

What sort of opsec mistakes do you have in mind? Something having to do with the content of the post like PII, credentials, credit card numbers, etc? Stylometry data points? Something about how they/you are posting like whether their user agent indicates they're using an outdated browser?

Also, whose posts are you hoping to scan? Your own? Are you a Lemmy instance runner who wants to warn your users or something?

What's your threat model? Who are you trying to guard against and what are you trying to keep them from getting from these posts?

[–] [email protected] 4 points 11 months ago

Thank you, I should have mentioned my threat model and needs more clearly.

I am looking to scan my own posts/comments for stylometry statistics, for the most part, but PII would be nice. I'll deal with the browser-agent, Cookies, IP etc.

Threat model would likely be to prevent people who might be wanting to link my identity with my online persona. Obviously, the government is excluded since they can just mine the IP from Lemmy mods and get to me. This is someone who is interested in my identity and will use FOSS/some proprietary tools to link my identities

[–] [email protected] 2 points 10 months ago

I believe Umbrella app kinda has something you want

[–] [email protected] 1 points 10 months ago (1 children)

you want to do scans before the content is posted? or you want to scan existing content online that you posted?

you could self-host LanguageTool for paraphrasing capability, which would vastly reduce stylometry correlations

https://github.com/languagetool-org/languagetool

[–] [email protected] 1 points 10 months ago

Thank you, I'll take a look! That's a great idea!