The performance of ChatGPT and Google Bard in medical oncology board examination

Authors

DOI:

https://doi.org/10.32552/actamedica.2026.1212

Keywords:

large language models, ChatGPT, Google Bard, medical oncology, board, exam

Abstract

Objective: Artificial intelligence (AI) is transforming healthcare, and large language models (LLMs) like ChatGPT and Google Bard have shown promise in providing medical information and decision support. The LLMs performed similarly or better than human participants in several board exams. However, their proficiency in complex clinical scenarios, like in oncology board exams, remains unclear. We aimed to assess the performance of three LLMs (ChatGPT 3.5, ChatGPT 4 and Google Bard) on the oncology board examination.

Materials and Methods: We utilized a question bank from the Turkish Society of Medical Oncology Board Exam encompassing 290 multiple-choice questions from 2021-2023. ChatGPT 3.5, ChatGPT 4, and Google Bard were asked to answer each question in both Turkish and English, providing explanations and confidence levels with their answers.

Results: The overall accuracy of LLMs was 59.3%, 42.8%, 36.2% for ChatGPT4, ChatGPT3.5, and Google Bard, respectively. The accuracy of ChatGPT 4 was significantly higher than that of ChatGPT 3.5 (p<0.001) and Google Bard (p<0.001), while the accuracy of ChatGPT3.5 was higher than that of Google Bard (p<0.001). Only the ChatGPT 4 was proficient in all three examination years (2021-2023). All LLMs performed better on translated questions than original Turkish ones. The LLMs were more accurate in general knowledge than case questions and were more confident in their answers for translated questions.

Conclusion: LLMs had moderate success in a medical oncology board exam, with only ChatGPT 4 demonstrating proficiency. The efficacy of LLMs in clinical decision-making requires further development, especially in native languages and complex case interpretations.

Downloads

Download data is not yet available.

Downloads

Published

2026-06-28

How to Cite

1.
Şahin TK, Dinçer M, Karadurmuş N, Güven DC. The performance of ChatGPT and Google Bard in medical oncology board examination. Acta Medica [Internet]. 2026 Jun. 28 [cited 2026 Jun. 29];57(2):139-45. Available from: https://actamedica.org/index.php/actamedica/article/view/1212

Issue

Section

Original Article