Statistical Considerations in Combining Multiple Biomarkers for Diagnostic Classification: Logistic Regression Risk Score Versus Discriminant Function Score

Karimollah  Hajian-Tilaki; Zahra Graili; Vahid Nassiri

doi:10.18502/jbe.v8i2.10412

Karimollah Hajian-Tilaki Department of Biostatistics and Epidemiology, School of Public Health, Babol University of Medical Sciences, Babol, Iran.
Zahra Graili Social Determinants of Health Research Center, Health Research Institute, Babol University of Medical Sciences, Babol, Iran.
Vahid Nassiri Open Analytics, Jupiterstraat 20, B-2600, Antwerpen, Belgium.

DOI: https://doi.org/10.18502/jbe.v8i2.10412

Keywords: Logistic regression model; Discriminant function score; ROC analysis; Area under the curve (AUC); Combining multiple biomarkers;

Abstract

Introduction: In clinical practices, multiple biomarkers are frequently used on the same subjects for the diagnosis of an adverse outcome. This study compares two alternative multiple linear regression approaches as the logistic regression model and the discriminant function score in combing several markers.

Methods: Ten thousand simulated data sets were generated from binormal and non-binormal pairs of distributions with different sample sizes and correlation structures. Each dataset underwent a logistic regression and the discriminant analysis simultaneously. The ROC analysis was performed with each marker alone and also their combining scores. For two alternative approaches, the average of AUC and its root mean square error (RMSE) were estimated over 10000 replications trials for all configurations and sample sizes used. The practical utility of the two methods is further illustrated with a clinical example of real data as well.

Results: The two approaches yielded identical accuracy in particular with binormal data. With non- binormal data, the logistic regression risk score produced an equal or slightly better accuracy than the discriminate function score.

Conclusion: Overall, the two approaches yield rather identical results. However, adopting the logistic regression model may incorporate a slightly better accuracy index than discriminant analysis with nonbinormal data.