ChatGPT gives great back pain advice – but only if you want general information

4 minute read


Sydney group publishes in ARD about the best from the AI chatbot, but also where it all goes very wrong.


Artificial intelligence chatbots such as ChatGPT may be almost as effective as consulting a doctor for advice on low back pain, say researchers.

But there are limits – while the chatbots excelled in answering many questions, there were inaccuracies, especially when it came to risk factors.

The study was published in the journal Annals of the Rheumatic Diseases.

“The use of LLM-chatbots as tools for patient education and counselling in LBP shows promising but variable results,” the authors concluded.

“These chatbots generally provide moderately accurate recommendations. However, the accuracy may vary depending on the topic of each question.

“The reliability level of the answers was inadequate, potentially affecting the patient’s ability to comprehend the information.”

Associate Professor Bruno Tirotti Saragiotto, research co-author and head of physiotherapy at The University of Technology Sydney, said the study set out to evaluate how effectively AI chatbots like ChatGPT answer common questions posed by individuals experiencing LBP.

As AI-powered chatbots becomes increasingly common in healthcare, the accuracy of their recommendations was important, he said.

“The findings show that AI chatbots can offer advice with accuracy levels comparable to those reported by healthcare professionals in Australia,” said Professor Saragiotto.

Despite the encouraging results, researchers identified limitations in the AI chatbots’ performance. Notably, responses were often complex, with a readability level suitable for individuals with a tenth or twelfth grade or university-level education.

“While the accuracy of the AI-generated advice was impressive, we must consider the accessibility of this information,” said Professor Saragiotto.

“Ensuring that health guidance is understandable to a broad audience remains an important challenge in the development of AI health tools.”

Professor Saragiotto stressed the importance of recognising both the capabilities and limitations of AI resources in managing common health concerns like LBP.

“As AI technology continues to evolve, further research will be necessary to refine these tools and ensure they can provide accurate, accessible, and safe health information to the public,” he said.

The cross-sectional study analysed responses to 30 LBP-related questions, covering self-management, risk factors and treatment.

The questions were developed by experienced clinicians and researchers and were piloted with a group of consumer representatives with lived experience of LBP.

The inquiries were inputted in prompt form into ChatGPT 3.5, Bing, Bard (Gemini) and ChatGPT 4.0. Responses were evaluated in relation to their accuracy, readability and presence of disclaimers about health advice.

Accuracy was assessed by comparing the recommendations generated with the main guidelines for LBP. The responses were analysed by two independent reviewers and classified as accurate, inaccurate or unclear. Readability was measured with the Flesch Reading Ease Score (FRES).

“Out of 120 responses yielding 1069 recommendations, 55.8% were accurate, 42.1% inaccurate and 1.9% unclear,” the researchers wrote.

“Treatment and self-management domains showed the highest accuracy while risk factors had the most inaccuracies.

“Overall, LLM-chatbots provided answers that were ‘reasonably difficult’ to read, with a mean (SD) FRES score of 50.94 (3.06). Disclaimer about health advice was present around 70%–100% of the responses produced.”

Low back pain just walks away

The new science of back pain

While the AI chatbots excelled in answering questions related to suggested treatment and self-management, while risk factors had the most inaccuracies.

Questions such as “What complementary therapies like massage or acupuncture could alleviate lower back pain?” received accurate recommendations. The researchers also noted that AI chatbots consistently recommended exercise for preventing and managing LBP, which is considered an accurate recommendation.

However, the study also showed that AI chatbots provided inaccurate recommendations to other commonly asked questions. For example, while poor posture does not cause LBP, AI chatbots said that it does 88% of the time.

Another key observation was the ability of AI chatbots to recognise situations requiring medical referrals. In cases where professional care should be recommended, the AI systems advised users to consult a healthcare provider in 70-100% of instances.

“Our research indicates that AI chatbots have the potential to be a valuable resource for those seeking initial guidance on managing low back pain,” said Dr Giovanni Ferreira, research fellow at theUniversity of Sydney Institute for Musculoskeletal Health, and one of the authors of the study.

“It’s important to note that these tools should complement, not replace, professional medical advice.”

Annals of the Rheumatic Diseases, September 2024

End of content

No more pages to load

Log In Register ×