Generative AI Chatbots for Reliable Cancer Information: Evaluating web-search, multilingual, and reference capabilities of emerging large language models

dc.contributor.authorMenz, B.D.
dc.contributor.authorModi, N.D.
dc.contributor.authorAbuhelwa, A.Y.
dc.contributor.authorRuanglertboon, W.
dc.contributor.authorVitry, A.
dc.contributor.authorGao, Y.
dc.contributor.authorLi, L.X.
dc.contributor.authorChhetri, R.
dc.contributor.authorChu, B.
dc.contributor.authorBacchi, S.
dc.contributor.authorKichenadasse, G.
dc.contributor.authorShahnam, A.
dc.contributor.authorRowland, A.
dc.contributor.authorSorich, M.J.
dc.contributor.authorHopkins, A.M.
dc.date.issued2025
dc.description.abstractRecent advancements in large language models (LLMs) enable real-time web search, improved referencing, and multilingual support, yet ensuring they provide safe health information remains crucial. This perspective evaluates seven publicly accessible LLMs—ChatGPT, Co-Pilot, Gemini, MetaAI, Claude, Grok, Perplexity—on three simple cancer-related queries across eight languages (336 responses: English, French, Chinese, Thai, Hindi, Nepali, Vietnamese, and Arabic). None of the 42 English responses contained clinically meaningful hallucinations, whereas 7 of 294 non-English responses did. 48 % (162/336) of responses included valid references, but 39 % of the English references were.com links reflecting quality concerns. English responses frequently exceeded an eighth-grade level, and many non-English outputs were also complex. These findings reflect substantial progress over the past 2-years but reveal persistent gaps in multilingual accuracy, reliable reference inclusion, referral practices, and readability. Ongoing benchmarking is essential to ensure LLMs safely support global health information dichotomy and meet online information standards.
dc.description.statementofresponsibilityBradley D. Menz, Natansh D. Modi, Ahmad Y. Abuhelwa, Warit Ruanglertboon, Agnes Vitry, Yuan Gao, Lee X. Li, Rakchha Chhetri, Bianca Chu, Stephen Bacchi, Ganessan Kichenadasse, Adel Shahnam, Andrew Rowland, Michael J. Sorich, Ashley M. Hopkins
dc.identifier.citationEuropean Journal of Cancer, 2025; 218:115274-1-115274-6
dc.identifier.doi10.1016/j.ejca.2025.115274
dc.identifier.issn0959-8049
dc.identifier.issn1879-0852
dc.identifier.orcidGao, Y. [0000-0001-6647-2156]
dc.identifier.orcidBacchi, S. [0000-0001-5130-8628]
dc.identifier.urihttps://hdl.handle.net/2440/146311
dc.language.isoen
dc.publisherElsevier
dc.relation.granthttp://purl.org/au-research/grants/nhmrc/APP2030913
dc.relation.granthttp://purl.org/au-research/grants/nhmrc/APP2008119
dc.rights© 2025 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ).
dc.source.urihttps://doi.org/10.1016/j.ejca.2025.115274
dc.subjectArtificial intelligence; large language model; health enquiries; cancer enquiries; language; English
dc.subject.meshHumans
dc.subject.meshNeoplasms
dc.subject.meshMultilingualism
dc.subject.meshArtificial Intelligence
dc.subject.meshInternet
dc.subject.meshConsumer Health Information
dc.subject.meshGenerative Artificial Intelligence
dc.subject.meshLarge Language Models
dc.titleGenerative AI Chatbots for Reliable Cancer Information: Evaluating web-search, multilingual, and reference capabilities of emerging large language models
dc.typeJournal article
pubs.publication-statusPublished

Files

Collections