this post was submitted on 08 Sep 2024
1235 points (98.3% liked)
Programmer Humor
32712 readers
1398 users here now
Post funny things about programming here! (Or just rant about your favourite programming language.)
Rules:
- Posts must be relevant to programming, programmers, or computer science.
- No NSFW content.
- Jokes must be in good taste. No hate speech, bigotry, etc.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
A word document is xml
zipped xml!
Lots or file formats are just zipped XML.
I was ~~reverse engineering~~ fucking around with the LBX file format for our Brother label printer's software at work, because I wanted to generate labels programmatically, and they're zipped XML too. Terrible format, LBX, really annoying to work with. The parser in Brother P-Touch Editor is really picky too. A string is 1 character longer or shorter than the length you defined in an attribute earlier in the XML? "I've never seen this file format in my life," says P-Touch Editor.
Sounds like it’s actually using XSLT or some kind of content validation. Which to be honest sounds like a good practice.
Here's an example of a text object taken from the XML, if you're curious: https://clips.clb92.xyz/2024-09-08_22-27-04_gfxTWDQt13RMnTIS.png
EDIT: And with more complicated strings (like ones havingnumbers or symbols - just regular-ass ASCII symbols, mind you) there will be tens of , because apparently numbers and letters don't even work the same. Even line breaks have their own . And if the number of these and their charLen don't match what's actually in pt:data, it won't open the file.
Is it because of the lower case Latin æ since it’s technically one character even if two bytes?
Nope, doesn't seem like it.
What a mess… sounds like the devs got burned by various Unicode edge cases RTL, etc
The future if text documents were Json:
City_pic.png.xml