Variable Selection for Recurrent Events Using Heuristic Approaches: Identifying Informative Variables for Rehospitalization in Schizophrenia Patients

  • Mahya Arayeshgar Student Research Committee, Hamadan University of Medical Sciences, Hamadan, Iran.
  • Leili Tapak Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
  • Sharareh Parami Student Research Committee, Hamadan University of Medical Sciences, Hamadan, Iran.
  • Behnaz Alafchi Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran.
Keywords: Random forest; Recursive feature elimination; Deviance residual; Recurrent event datasets; Variable selection; Schizophrenia

Abstract

Introduction: Recurrent event data, as a generalization of survival data, are frequently observed in various areas of medical research, including sequential hospitalizations in patients with schizophrenia. As experiencing multiple relapses during schizophrenia can have many implications, such as self-harm or harm to others, loss of education or employment, or other adverse outcomes, identifying and determining the most critical factors related to relapses in this disorder is essential. This study aimed to utilize heuristic approaches for selecting predictor variables in the field of recurrent events with an application to schizophrenia disorder

Methods: A two-step algorithm was employed to apply a combination of two variable selection methods, recursive feature elimination (RFE) and genetic algorithm feature selection (GAFS), and four modeling techniques: Gradient boosting (GB), artificial neural network (ANN), random forest (RF), and support vector machine (SVM) to simulated recurrent event datasets.

Results: In most simulation scenarios, the results indicated that the combination of RFE and RF applied to the deviance residual (DR) outperforms the other methods. The RFE-RF-DR selected the following predictor variables: Number of children, age, marital status, and history of substance abuse.

Conclusion: Our findings revealed that the proposed machine learning-based model is a promising technique for selecting predictor variables associated with a recurrent outcome when analyzing multivariate time-toevent data with recurrent events.

Published
2023-10-31
Section
Articles