Comparison of Two Methods, Gradient Boosting and Extreme Gradient Boosting to Pre- dict Survival in Covid-19 Data

Nadiasadat Taghavi Razavizadeh; Maryam Salari; Mostafa Jafari; Ehsan Sabaghian; Vahid Ghavami

doi:10.18502/jbe.v9i3.15450

Nadiasadat Taghavi Razavizadeh Department of Biostatistics, School of Health, Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
Maryam Salari Department of Biostatistics, School of Health, Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
Mostafa Jafari Department of Internal Diseases, Mashhad University of Medical Sciences, Mashhad, Iran.
Ehsan Sabaghian Department of Plant Biotechnology and Bioinformatics,Ghent University, Ghent, Belgium, and VIB Center for Plant Systems Biology, Ghent, Belgium.
Vahid Ghavami Department of Biostatistics, School of Health, Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.

DOI: https://doi.org/10.18502/jbe.v9i3.15450

Keywords: Gradient boosting algorithm; Extreme gradient boosting algorithm; Survival analysis; Covid-19.

Abstract

Introduction: The present study discusses the importance of having a predictive method to determine the prognosis of patients with diseases like Covid-19. This method can assist physicians in making treatment decisions that improve survival rates and avoid unnecessary treatments. This research also highlights the importance of calibration, which is often overlooked in model evaluation. Without proper calibration, incorrect decisions can be made in disease treatment and preventive care. Therefore, the current study compares two highly accurate machine learning algorithms, Gradient boosting and Extreme gradient boosting, not only in terms of prediction accuracy but also in terms of model calibration and speed.

Methods: This study involved analyzing data from Covid-19 patients who were admitted to two hospitals in Mashhad city, Razavi Khorasan province, over a span of 18 months. The k-fold cross-validation method was employed on the training dataset (K=5) to conduct the study. The accuracy and calibration of two methods (Gradient boosting and Extreme gradient boosting) in predicting survival were compared using the Concordance Index and calibration.

Results: The Concordance Index values obtained for gradient boosting and Extreme gradient boosting models were 0.734 and 0.736, in the imbalanced and In the balanced data, the Concordance Index values were 0.893 for gradient boosting and 0.894 for Extreme gradient boosting. The surv.calib_beta index, the gradient boosting model had an estimated value of 0.59 in the imbalanced data and 0.66 in the balanced data. The Extreme gradient boosting model had an estimated value of 0.86 in the balanced data and 0.853 in the imbalanced data. The Extreme gradient boosting model was faster in the learning process compared to the gradient boosting model.

Conclusion: The Gradient boosting and Extreme gradient boosting models exhibited similar prediction accuracy and discrimination power, but the Extreme gradient boosting model demonstrated relatively good calibration compare to Gradient boosting model.