this post was submitted on 25 Oct 2023
452 points (97.3% liked)

Lemmy Shitpost

26652 readers
3186 users here now

Welcome to Lemmy Shitpost. Here you can shitpost to your hearts content.

Anything and everything goes. Memes, Jokes, Vents and Banter. Though we still have to comply with lemmy.world instance rules. So behave!


Rules:

1. Be Respectful


Refrain from using harmful language pertaining to a protected characteristic: e.g. race, gender, sexuality, disability or religion.

Refrain from being argumentative when responding or commenting to posts/replies. Personal attacks are not welcome here.

...


2. No Illegal Content


Content that violates the law. Any post/comment found to be in breach of common law will be removed and given to the authorities if required.

That means:

-No promoting violence/threats against any individuals

-No CSA content or Revenge Porn

-No sharing private/personal information (Doxxing)

...


3. No Spam


Posting the same post, no matter the intent is against the rules.

-If you have posted content, please refrain from re-posting said content within this community.

-Do not spam posts with intent to harass, annoy, bully, advertise, scam or harm this community.

-No posting Scams/Advertisements/Phishing Links/IP Grabbers

-No Bots, Bots will be banned from the community.

...


4. No Porn/ExplicitContent


-Do not post explicit content. Lemmy.World is not the instance for NSFW content.

-Do not post Gore or Shock Content.

...


5. No Enciting Harassment,Brigading, Doxxing or Witch Hunts


-Do not Brigade other Communities

-No calls to action against other communities/users within Lemmy or outside of Lemmy.

-No Witch Hunts against users/communities.

-No content that harasses members within or outside of the community.

...


6. NSFW should be behind NSFW tags.


-Content that is NSFW should be behind NSFW tags.

-Content that might be distressing should be kept behind NSFW tags.

...

If you see content that is a breach of the rules, please flag and report the comment and a moderator will take action where they can.


Also check out:

Partnered Communities:

1.Memes

2.Lemmy Review

3.Mildly Infuriating

4.Lemmy Be Wholesome

5.No Stupid Questions

6.You Should Know

7.Comedy Heaven

8.Credible Defense

9.Ten Forward

10.LinuxMemes (Linux themed memes)


Reach out to

All communities included on the sidebar are to be made in compliance with the instance rules. Striker

founded 1 year ago
MODERATORS
 
top 27 comments
sorted by: hot top controversial new old
[–] [email protected] 42 points 1 year ago (2 children)

Not sure why the devs have so much trouble with parsing this. I’m not sure if it’s an API thing or a front-end issue

[–] [email protected] 23 points 1 year ago* (last edited 1 year ago) (3 children)

Something somewhere is running an htmlspecialchars() or equivalent on whatever you input, probably as an attempt at "sanitizing" the text entered in titles/posts/comments. You know, to keep me from just inserting a javascript tag with src='http://pwned.ru/fu.js' into a comment and have it to something naughty to anyone who loads the page.

I'm certain these are being stored in the database as an & amp;, but they're not being decoded back into an ampersand character upon display.

[–] [email protected] 5 points 1 year ago (3 children)

I know, just sanitize it again. .Replace(“&”, &), Regex.Remove(amp;), if(.Contains(“amp;”))

[–] [email protected] 6 points 1 year ago

The same with < and &lt; please

[–] [email protected] 3 points 1 year ago

No. This is just escaped html. So you can just unescape it like every other html.

[–] [email protected] 1 points 1 year ago

Please be kidding lol

[–] [email protected] 3 points 1 year ago (2 children)

would it not be possible for whatever's decoding it to run arbitrary Javascript if done wrong? maybe that's why it doesn't exist yet?

[–] [email protected] 9 points 1 year ago

The decode really, really, really should not be happening client side in Javascript. The backend should handle it before handing the text to the user's browser. You are correct; If this is done client side it means a bad actor can mess with it and/or include an injection attack of some sort.

Nothing client side should ever handle user input, except perhaps convenience features like flagging incomplete fields or kicking the cursor to the next input element when one is full (e.g. for phone numbers). Anything client side can be fucked with by the client. Validation needs to happen on the server side, before committing the input to the database (or doing whatever it's going to do with it).

[–] [email protected] 6 points 1 year ago (1 children)

There are a lot of potential pitfalls any time you accept text input from a user, store it, and regurgitate it back to display on a user's browser. The thing is, HTML (and all HTML-encapsulated scripting languages) are just text. So regular words and a block of Javascript that makes dancing polka-dotted hippos dance across your screen and incessantly play the Hamster Dance song at 200% volume are, without protections, input and stored exactly the same way. Preventing ne'er-do-wells from doing injection attacks with SQL calls, HTML, control and escape characters, Javascript, etc. is part of a whole industry.

It appears lemmy does filter out raw HTML tags, at least. I tried to insert one in my last comment just for illustration and it was silently removed from the input.

[–] [email protected] 1 points 1 year ago (1 children)

I can't use <3 in a post title without it getting mangled.

[–] [email protected] 1 points 1 year ago

That's because the sanitization here is shit, but I bet you'd rather have that than be attacked by stored cross-site scripting attacks :)

[–] [email protected] 1 points 1 year ago

Theres a git issue on this

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)
[–] [email protected] 13 points 1 year ago

&AMP;AMP;AMP;AMP;AMP;AMP;AMP;AMP;AMP;AMP;AMP;AMP;AMP;AMP;AMP;

[–] [email protected] 11 points 1 year ago (1 children)

Yes, it's annoying. However, there is a fullwidth ampersand available, as well as lots of other curly characters. I tried adding combining backslashes to those that miss that final stroke of ‘&’ but I gave up as they don't align consistently across platforms.

&﹠Ֆ𐒒౭𐓯꯴𖩒᱒𝛂̊ɑͦ𝛂ͦ

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago) (1 children)

im seeing th last three as (a/alpha)-with-circle-above ... is this intentional?

[–] [email protected] 3 points 1 year ago (1 children)

Yes. The first one is a standard diacritical circle as seen in Swedish, the other two are a combining “o” above. The middle one is a standard alpha while the others are math script bold alphas. This is an attempt to create an ersatz “&” without the character 0x26 that appears as &amp; in post titles and monospace text. The fullwidth ampersand is obviously the closest but it appears with space around it so “D&D” looks more like “D & D” than “D&D”. Hence the attempts with weird characters (D𝛂ͦD, D꯴D, D𐓯D...). Unfortunately, some OSs (notably Windows 10) incorrectly draw most combining diacritics offset to the right of the base letter as a zero-width character.

There is a pair of single (non-combining) characters I can demonstrate it with: a combination of K + combining comma below is supposed to look like Ķ but it looks kinda like Қ on Windows. However, Windows stacks combining diacritics so you can create impressively tall Z̷͍̝̿̈́͂̅͝͝͡͡a̷̢̡̢̙̬͕̯̹͍͖̰͕̦͉̘̪͗͂̑͂̑̾́̀̍͗̾́̄̕͝͝l̵̢̨̢̳̻̩̳̘͖͎͎̘̗̬̞͈͌g̸̨͈̠͔̖͎͇̟̍͗̈́̏̿̍̍͗̕o̷͓͎̙͉̯̱̊́͐͘ ̵̣͓̥͈̗̯̹̭̪͕͓̳̈́͌̐̈́̓̈́͑̌̚͡͝t̶̨̛̩̭̬̲͓̣͔̪̠̙̞͚̒͒̽͟͝ͅe̵̛̘͈̜͖̅͊́̅̌̇̒̐͑̚͠͝x̴͌͗͐͌̏͛̕͡͝ͅt̴̡̤̻̺̣̥̝̤̼̺̦̣͎̟́̊̂̈́̃͑̈́̈̈́ while some other platforms just print them over each other like a typewriter does.

[–] [email protected] 2 points 1 year ago

oh, sorry, i see it now.... although i prefer D꯴D

[–] [email protected] 10 points 1 year ago

Iñtërnâtiônàlizætiøn

[–] [email protected] 7 points 1 year ago

Lol this is clever and quick.

[–] [email protected] 5 points 1 year ago (1 children)
[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

The problem exists in post titles and fullwidth text.

Ampersand (&): &amp;
Less-than sign (ᐸ) : &lt;

(The characters in parentheses are closest Unicode lookalikes)

[–] [email protected] 5 points 1 year ago

Oh yeah! But what about a real encoding problem meme?

We're probably dozens who'd appreciate one!

[–] [email protected] 5 points 1 year ago

It is already fixed an will be rolled out soon.

https://github.com/LemmyNet/lemmy/issues/3987

[–] [email protected] 4 points 1 year ago (1 children)
[–] [email protected] 3 points 1 year ago

The problem exists in post titles and fullwidth text.

Ampersand (&): &amp;
Less-than sign (ᐸ) : &lt;
(The characters in parentheses are closest Unicode lookalikes)
[–] [email protected] 4 points 1 year ago

Ah, an En Passant in the wild