Abstract
Artificial intelligence (AI) models can play a more effective role in managing patients with the explosion of digital health records available in the healthcare industry. Machine-learning (ML) and deep-learning (DL) techniques are two methods used to develop predictive models that serve to improve the clinical processes in the healthcare industry. These models are also implemented in medical imaging machines to empower them with an intelligent decision system to aid physicians in their decisions and increase the efficiency of their routine clinical practices. The physicians who are going to work with these machines need to have an insight into what happens in the background of the implemented models and how they work. More importantly, they need to be able to interpret their predictions, assess their performance, and compare them to find the one with the best performance and fewer errors. This review aims to provide an accessible overview of key evaluation metrics for physicians without AI expertise. In this review, we developed four real-world diagnostic AI models (two ML and two DL models) for breast cancer diagnosis using ultrasound images. Then, 23 of the most commonly used evaluation metrics were reviewed uncomplicatedly for physicians. Finally, all metrics were calculated and used practically to interpret and evaluate the outputs of the models. Accessible explanations and practical applications empower physicians to effectively interpret, evaluate, and optimize AI models to ensure safety and efficacy when integrated into clinical practice.