Logistic Regression Model Based on Ultrafast Pulse Wave Velocity and Different Feature Selection Methods to Predict the Risk of Hypertension

  • Xue Bai School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China
  • Wenjun Liu School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China
  • Hui Huang Department of Ultrasound, Affiliated Hospital of Nanjing University of CM, Nanjing 210029, China
  • Huan You School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China
Keywords: Hypertension; Ultrafast pulse wave velocity; Feature selection; Logistic regression

Abstract

Background: Hypertension is the main reason why the incidence of cardiovascular disease has increased year-by-year and early diagnosis of hypertension is necessary to reducing the incidence of cardiovascular disease. This also puts forward higher requirements for the accuracy of diagnosis. We tried a variety of feature selection methods to improve the accuracy of logistic regression (LR).

Methods: We collected 397 samples from Nanjing, Jiangsu, China between Jan 2016 and Dec 2017, including 178 hypertension samples and 219 control samples. It includes not only clinical and laboratory data, but also imaging data. We focused on the difference of imaging attributes between the control group and the hypertension group, and analyzed the correlation coefficients of all attributes. In order to establish the optimal LR model, this study tried three different feature selection methods, including statistical analysis, random forest (RF) and extreme gradient boosting (XGBoost). The area under the ROC curve (AUC) and accuracy were used as the main criterion for model evaluation.

Results: In the prediction of hypertension, the performance of LR with RF as the feature selection method (accuracy: 0.910; AUC: 0.924) was better than the performance of LR with XGBoost as the feature selection method (accuracy: 0.897; AUC: 0.915) and the performance of LR with statistical analysis as the feature selection method (accuracy: 0.872; AUC: 0.926).

Conclusion: LR with RF as the feature selection method may provide accurate results in predicting hypertension. Carotid intima-media thickness (cIMT) and pulse wave velocity at the end of systole (ESPWV) are two key imaging indicators in the prediction of hypertension.

Published
2022-09-11
Section
Articles