this post was submitted on 04 Jan 2024

180 points (90.5% liked)

Technology

59398 readers

2917 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

180

ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate | It was bad at recognizing relationships and needs selective training, researchers say. (arstechnica.com)

submitted 10 months ago by [email protected] to c/[email protected]

24 comments fedilink hide all child comments

ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate | It was bad at recognizing relationships and needs selective training, researchers say.::It was bad at recognizing relationships and needs selective training, researchers say.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 98 points 10 months ago (6 children)

Why do people keep expecting a language model to be able to do literally everything. AI works best when it's a model trained to solve a problem. You can't just throw everything at a chatbot and expect it to have any sort of competence.

[–] [email protected] 39 points 10 months ago (2 children)

The average person isn't very smart. All they see is a magical black box that goes brr.

[–] [email protected] 25 points 10 months ago* (last edited 10 months ago)

My wife is a physician and I’ve talked with her about this with regards to healthcare in general. Most people still think of healthcare like a visiting a wizard for a potion or somatic incantation.

So throw 2 black box-type problems at each other and I have no doubt that a lot of people would be surprised that the results are crap.

[–] [email protected] 15 points 10 months ago

Pretty much this.

[–] [email protected] 13 points 10 months ago (2 children)

Because you can talk to it and it's programmed to make you think it knows a lot and is capable of doing so much more.

People expect it to do more because chatgpt was trained to make people expect it to do more.

It's all lies, of course. Chargpt fails at more than the simplest of tasks and can't use any new information because the internet is full of ai generated text now, which is poison to training models. But it's good at pretending.

[–] [email protected] 16 points 10 months ago* (last edited 10 months ago) (1 children)

The thing that really annoys me is the people who are most enamoured with Chat GPT also seem to be the ones least capable of judging its accuracy and actual output quality.

I write for a living; a newspaper. So naturally, some of the people in our company - sales people - wanted to test it. And they were delighted with the stuff it wrote. Which was terrible to read, factually incorrect, repetitive and just not something we’d put in the paper. But they loved it. Because they weren’t writers and don’t know how to write an engaging article with proper sources.

I tested it as well. Wanted to form my own opinion and read up on the limitations, how to write good prompts, etc. So I could give it a fair chance.

I had it write a basic 500 word article about things to see in our city, with information about the tourist info office. That’s something a first year intern can do in his second week with us.

Basically, it ended up ‘inventing’ two museums that don’t exist, it listed info for a museum on the other side of the country, it listed an ‘Olympic stadium’ (we never hosted the Olympics) and it gave a completely wrong address for the tourist info, even though it should have it.

It was factually incorrect in just about every sentence. But it all sounded plausible enough and was written with such confidence that anyone not from this city might assume it to be true.

I don’t want that fucking thing anywhere NEAR my newspaper. The sales people are pretty much monkeys with Chat GPT-typewriters, churning out drivel instead of Shakespeare.

[–] [email protected] 6 points 10 months ago* (last edited 10 months ago)

deleted

[–] [email protected] 5 points 10 months ago

the internet is full of ai generated text now, which is poison to training models. But it’s good at pretending.

This misconception shows up again and again. It's wishful thinking from people who want to think AI researchers are idiots and AIs are going to kill themselves.

These models aren't trained on "the internet". They don't just thoughtlessly rip everything that's ever been posted every time they want to make an updated bot. The vast bulk of training data was scraped years ago, predating the current tide of generative muck, and additions are carefully curated to avoid the exact thing you're talking about. A scrape of the 2018 internet is plenty, and will remain so for years and years.

[–] [email protected] 3 points 10 months ago

Because when you use the SotA model and best practices in prompting it actually can do a lot of things really well, including diagnose medical cases:

We assessed the performance of the newly released AI GPT-4 in diagnosing complex medical case challenges and compared the success rate to that of medical-journal readers. GPT-4 correctly diagnosed 57% of cases, outperforming 99.98% of simulated human readers generated from online answers. We highlight the potential for AI to be a powerful supportive tool for diagnosis

Use of GPT-4 to Diagnose Complex Clinical Cases

The OP study isn't using GPT-4. It's using GPT-3.5, which is very dumb. So the finding is less "LLMs can't diagnose pediatric cases" and more "we don't know how to do meaningful research on LLMs in medicine."

[–] [email protected] 3 points 10 months ago

These articles may be more so about “it’s not for medical uses you fucking morons” and less so “WOAH WHO KNEW MAN”

[–] [email protected] 1 points 10 months ago (1 children)

Because Google's med palm 2 is a medically trained chatbot that performs better than most med students, and some med professionals. Further training and refinement using new chatbot findings like mixture of experts and chain of thought are likely to improve results.

[–] [email protected] 5 points 10 months ago (1 children)

Exactly, med-palm 2 was specifically trained for being a medical chatbot, not general purpose like chatgpt

[–] [email protected] 1 points 10 months ago

Train with the internet, get results like it is in Internet. Are medical content in Internet good? No, it is shit, so it will give shit results.

These are great base models, understanding larger context is always better for LLM, but specialization is needed for these kind of contexts.

[–] [email protected] 1 points 10 months ago

Especially not now it has been nerfed to shit