this post was submitted on 15 Oct 2024
493 points (96.4% liked)

Technology

59658 readers
3234 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 9 points 1 month ago* (last edited 1 month ago) (3 children)

So I keep seeing people reference this... And I found it curious of a concept that LLMs have problems with this. So I asked them... Several of them...

Outside of this image... Codestral ( my default ) got it actually correct and didn't talk itself out of being correct... But that's no fun so I asked 5 others, at once.

What's sad is that Dolphin Mixtral is a 26.44GB model...
Gemma 2 is the 5.44GB variant
Gemma 2B is the 1.63GB variant
LLaVa Llama3 is the 5.55 GB variant
Mistral is the 4.11GB Variant

So I asked Codestral again because why not! And this time it talked itself out of being correct...

Edit: fixed newline formatting.

[–] [email protected] 2 points 1 month ago* (last edited 1 month ago)

Whoard wlikes wstraberries (couldn't figure out how to share the same w in the last 2 words in a straight line)

[–] [email protected] 1 points 1 month ago (1 children)

Interesting. . . I'd say Gemma 2B wasn't actually wrong - it just didn't answer the question you asked! I wonder if they have this problem with other letters - like maybe it's something to do with how we say w as double-you . . . But maybe not, because they seem to be underestimating rather and overestimating. But yeah, I guess the fuckers just can't count. You'd think a question using the phrase 'How many . . .' would be a giveaway that they might need to count something rather than rely on knowledge base.

[–] [email protected] 1 points 1 month ago

I’d say Gemma 2B wasn’t actually wrong

I call that talking itself out of being correct.

[–] [email protected] 1 points 1 month ago

LOL 😆😅! I totally made it up! And it worked! So maybe it's not just R's that it has trouble counting. It's any letter at all.