ChatGPT isn’t ready for medical ‘tests’ as it fails in getting 83% cases wrong

A new study published in JAMA Pediatrics has thrown cold water on the hopes of some for AI-powered medical diagnoses, revealing that the popular language model ChatGPT-4 performed poorly in evaluating children’s health cases. According to a report by Ars Technica, with an error rate of a staggering 83%, the study underscores the dangers of relying on unvetted AI in high-stakes situations like healthcare.
Researchers from Cohen Children’s Medical Center in New York tested ChatGPT-4 against 100 anonymised paediatric case studies, covering a range of common and complex conditions. The chatbot’s dismal performance, missing vital clues and providing inaccurate diagnoses in the overwhelming majority of cases, raises serious concerns about the readiness of current AI technology for medical applications.
Out of 100 cases, ChatGPT provided correct answers in only 17 instances. In 72 cases, it gave inaccurate responses, and in the remaining 11 cases, it did not entirely grasp the correct diagnosis. Among the 83 incorrect diagnoses, 57 percent (47 cases) were related to the same organ system, as per the report.
How was ChatGPT evaluated?
During ChatGPT’s evaluation, the researchers inserted the pertinent text of medical cases into the prompt. Subsequently, two qualified physician-researchers assessed the AI-generated responses, categorising them as either correct, incorrect, or “did not fully capture the diagnosis.” In instances where ChatGPT fell into the latter category, it often provided a clinically related condition that was overly broad or insufficiently specific to be deemed the accurate diagnosis. For example, in diagnosing a child’s case, ChatGPT identified a branchial cleft cyst—a lump in the neck or below the collarbone—when the correct diagnosis was Branchio-oto-renal syndrome. According to the report, this syndrome is a genetic condition leading to abnormal tissue development in the neck, along with malformations in the ears and kidneys. Notably, one of the indicators of this condition is the occurrence of branchial cleft cysts.
However, the study did mention that ChatGPT can be used as a supplementary tools. As part of the findings, the study noted that “LLM-based chatbots could be used as a supplementary tool for clinicians in diagnosing and developing a differential list for complex cases.”

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Swift Telecast is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – swifttelecast.com. The content will be deleted within 24 hours.

Leave a Comment