Evaluation of AI Chatbots in Tooth Avulsion Management According to the International Association of Dental Traumatology Guidelines

Özdemir, Merve; Yıldırım Manav, Esra

doi:10.14744/lhhs.2026.38881

Merve Özdemir¹ ,

Esra Yıldırım Manav²

¹Department of Pediatric Dentistry, Faculty of Dentistry, Lokman Hekim University, Ankara, Türkiye
²Department of Restorative Dentistry, Faculty of Dentistry, Lokman Hekim University, Ankara, Türkiye

Lokman Hekim Health Sciences 2026; 6(2): 255-263 DOI: 10.14744/lhhs.2026.38881

Full Text PDF

Abstract

Introduction: This study aimed to evaluate the extent to which widely used artificial intelligence (AI)-based chatbots adhere to the 2020 International Association of Dental Traumatology (IADT) guidelines for the management of tooth avulsion and to assess the accuracy of the bibliographic references (i.e., complete citation details including title, authors, journal, year, and DOI) they generate.
Materials and Methods: This cross-sectional observational study assessed four AI-based chatbots (ChatGPT-5.2, Perplexity AI, Gemini 2.5 Flash, and DeepSeek-v3.2) using ten standardized, clinician-directed avulsion scenarios aligned with the 2020 IADT guidelines. Each scenario was submitted once per chatbot, without iterative prompting, on 3 January 2026. Scenarios varied by extra-oral dry time, storage medium, apex maturity, dentition type, and replantation timing. Responses were evaluated using the 9-item IADT Compliance Index. Bibliographic accuracy was assessed using the reference hallucination score (RHS).
Results: No statistically significant difference was observed in overall normalized compliance scores among the chatbots (p=0.089). However, significant between-model differences emerged in technically critical domains, including root surface cleaning (p=0.017), and splint type and duration (p<0.001). ChatGPT-5.2 and Perplexity AI consistently outperformed Gemini 2.5 Flash and DeepSeek-v3.2. Although RHS values did not differ significantly between models (p=0.114), all chatbots demonstrated occasional reference hallucinations.
Discussion and Conclusion: Performance was higher in simpler scenarios, such as immediate replantation, whereas more complex conditions – particularly prolonged dry time and primary tooth avulsion – showed lower compliance and greater variability. Although chatbots reproduce general principles, limitations restrict reliability; thus, they should be used with clinician supervision.

Keywords: Artificial intelligence; Dental trauma; International Association of Dental Traumatology guidelines; Tooth avulsion