The performance of ChatGPT and Google Bard in medical oncology board examination

Taha Koray Şahin; Murat Dinçer; Nuri Karadurmuş; Deniz Can Güven

doi:10.32552/actamedica.2026.1212

Authors

Taha Koray Şahin Department of Medical Oncology, Hacettepe University, Ankara, Türkiye https://orcid.org/0000-0002-3590-0426
Murat Dinçer Department of Medical Oncology, Eskişehir Osmangazi University, Eskişehir, Türkiye https://orcid.org/0000-0002-9146-813X
Nuri Karadurmuş Department of Medical Oncology, Gulhane School of Medicine, University of Health Sciences, Ankara, Türkiye https://orcid.org/0000-0003-3291-8062
Deniz Can Güven Department of Medical Oncology, Hacettepe University, Ankara, Türkiye https://orcid.org/0000-0002-6924-9467

DOI:

https://doi.org/10.32552/actamedica.2026.1212

Keywords:

large language models, ChatGPT, Google Bard, medical oncology, board, exam

Abstract

Objective: Artificial intelligence (AI) is transforming healthcare, and large language models (LLMs) like ChatGPT and Google Bard have shown promise in providing medical information and decision support. The LLMs performed similarly or better than human participants in several board exams. However, their proficiency in complex clinical scenarios, like in oncology board exams, remains unclear. We aimed to assess the performance of three LLMs (ChatGPT 3.5, ChatGPT 4 and Google Bard) on the oncology board examination.

Materials and Methods: We utilized a question bank from the Turkish Society of Medical Oncology Board Exam encompassing 290 multiple-choice questions from 2021-2023. ChatGPT 3.5, ChatGPT 4, and Google Bard were asked to answer each question in both Turkish and English, providing explanations and confidence levels with their answers.

Results: The overall accuracy of LLMs was 59.3%, 42.8%, 36.2% for ChatGPT4, ChatGPT3.5, and Google Bard, respectively. The accuracy of ChatGPT 4 was significantly higher than that of ChatGPT 3.5 (p<0.001) and Google Bard (p<0.001), while the accuracy of ChatGPT3.5 was higher than that of Google Bard (p<0.001). Only the ChatGPT 4 was proficient in all three examination years (2021-2023). All LLMs performed better on translated questions than original Turkish ones. The LLMs were more accurate in general knowledge than case questions and were more confident in their answers for translated questions.

Conclusion: LLMs had moderate success in a medical oncology board exam, with only ChatGPT 4 demonstrating proficiency. The efficacy of LLMs in clinical decision-making requires further development, especially in native languages and complex case interpretations.

Downloads

Download data is not yet available.

The performance of ChatGPT and Google Bard in medical oncology board examination

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

guideforauthors

indexing

earlyonline

about