Evaluating AI-generated patient education materials for spinal surgeries: comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models

Zhou, M.; Pan, Y.; Zhang, Y.; Song, X.; Zhou, Y.

doi:10.1016/j.ijmedinf.2025.105871

Evaluating AI-generated patient education materials for spinal surgeries: comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models

dc.contributor.author	Zhou, M.
dc.contributor.author	Pan, Y.
dc.contributor.author	Zhang, Y.
dc.contributor.author	Song, X.
dc.contributor.author	Zhou, Y.
dc.date.issued	2025
dc.description	Data source: Supplementary material, https://doi.org/10.1016/j.ijmedinf.2025.105871
dc.description.abstract	Background: Access to patient-centered health information is essential for informed decision-making. However, online medical resources vary in quality and often fail to accommodate differing degrees of health literacy. This issue is particularly evident in surgical contexts, where complex terminology obstructs patient comprehension. With the increasing reliance on AI models for supplementary medical information, the reliability and readability of AI-generated content require thorough evaluation. Objective: This study aimed to evaluate four natural language processing models-ChatGPT-4o, ChatGPT-o3 mini, DeepSeek-V3, and DeepSeek-R1-in generating patient education materials for three common spinal surgeries: lumbar discectomy, spinal fusion, and decompressive laminectomy. Information quality was evaluated using the DISCERN score, and readability was assessed through Flesch-Kincaid indices. Results: DeepSeek-R1 produced the most readable responses, with Flesch-Kincaid Grade Level (FKGL) scores ranging from 7.2 to 9.0, succeeded by ChatGPT-4o. In contrast, ChatGPT-o3 exhibited the lowest readability (FKGL > 10.4). The DISCERN scores for all AI models were below 60, classifying the information quality as "fair," primarily due to insufficient cited references. Conclusion: All models achieved merely a "fair" quality rating, underscoring the necessity for improvements in citation practices, and personalization. Nonetheless, DeepSeek-R1 and ChatGPT-4o generated more readable surgical information than ChatGPT-o3. Given that enhanced readability can improve patient engagement, reduce anxiety, and contribute to better surgical outcomes, these two models should be prioritized for assisting patients in the clinical. Limitation & Future direction: This study is limited by the rapid evolution of AI models, its exclusive focus on spinal surgery education, and the absence of real-world patient feedback, which may affect the generalizability and long-term applicability of the findings. Future research ought to explore interactive, multimodal approaches and incorporate patient feedback to ensure that AI-generated health information is accurate, accessible, and facilitates informed healthcare decisions.
dc.identifier.citation	International Journal of Medical Informatics, 2025; 198(105871):1-5
dc.identifier.doi	10.1016/j.ijmedinf.2025.105871
dc.identifier.issn	1386-5056
dc.identifier.issn	1872-8243
dc.identifier.uri	https://hdl.handle.net/11541.2/42395
dc.language.iso	en
dc.publisher	Elsevier
dc.rights	Copyright 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license. (http://creativecommons.org/licenses/by/4.0/)
dc.source.uri	https://doi.org/10.1016/j.ijmedinf.2025.105871
dc.subject	ai-generated health information
dc.subject	spinal surgery education
dc.subject	patient health literacy
dc.subject	readability
dc.title	Evaluating AI-generated patient education materials for spinal surgeries: comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models
dc.type	Journal article
pubs.publication-status	Published
ror.fileinfo	12300561790001831 13300561780001831 Open Access Published Version
ror.mmsid	9916959237101831

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 9916959237101831_12300561790001831_Evaluating AI-generated patient education materials for spinal surgeries.pdf
Size:: 769.96 KB
Format:: Adobe Portable Document Format
Description:: Published version

Download

Evaluating AI-generated patient education materials for spinal surgeries: comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models

Files

Original bundle

Collections