Comparing of Data Mining Techniques for Predicting In-Hospital Mortality Among Patients with COVID-19

  • Mostafa Shanbehzadeh Department of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
  • Azam Orooji Department of Advanced Technologies, School of Medicine, North Khorasan University of Medical Science, North Khorasan, Iran.
  • Hadi Kazemi-Arpanahi Department of Health Information Technology, Abadan University of Medical Sciences, Abadan, Iran.
Keywords: COVID‐19; Coronavirus; Artificial intelligence; Machine learning; Mortality


Introduction: The COVID-19 epidemic is currently fronting the worldwide health care systems with many qualms and unexpected challenges in medical decision-making and the effective sharing of medical resources. Machine Learning (ML)-based prediction models can be potentially advantageous to overcome these uncertainties.

Objective: This study aims to train several ML algorithms to predict the COVID-19 in-hospital mortality and compare their performance to choose the best performing algorithm. Finally, the contributing factors scored using some feature selection methods.

Material and Methods: Using a single-center registry, we studied the records of 1353 confirmed COVID19 hospitalized patients from Ayatollah Taleghani hospital, Abadan city, Iran. We applied six feature scoring techniques and nine well-known ML algorithms. To evaluate the models’ performances, the metrics derived from the confusion matrix calculated.

Results: The study participants were 1353 patients, the male sex found to be higher than the women (742 vs. 611), and the median age was 57.25 (interquartile 18-100). After feature scoring, out of 54 variables, absolute neutrophil/lymphocyte count and loss of taste and smell were found the top three predictors. On the other hand, platelet count, magnesium, and headache gained the lowest importance for predicting the COVID-19 mortality. Experimental results indicated that the Bayesian network algorithm with an accuracy of 89.31% and a sensitivity of 64.2 % has been more successful in predicting mortality.

Conclusion: ML provides a reasonable level of accuracy in predicting. So, using the ML-based prediction models facilitate more responsive health systems and would be beneficial for timely identification of vulnerable patients to inform appropriate judgment by the health care providers. Abbreviation: Coronavirus Disease 2019 (COVID‐19), World Health Organization (WHO), Machine Learning (ML), Artificial Intelligence (AI), Multilayer Perceptron (MLP), Support Vector Machine (SVM), Locally Weighted Learning (LWL), Clinical Decision Support System (CDSS)