Variable Selection for Recurrent Events Using Heuristic Approaches: Identifying Informative Variables for Rehospitalization in Schizophrenia Patients
Abstract
Introduction: Recurrent event data, as a generalization of survival data, are frequently observed in various areas of medical research, including sequential hospitalizations in patients with schizophrenia. As experiencing multiple relapses during schizophrenia can have many implications, such as self-harm or harm to others, loss of education or employment, or other adverse outcomes, identifying and determining the most critical factors related to relapses in this disorder is essential. This study aimed to utilize heuristic approaches for selecting predictor variables in the field of recurrent events with an application to schizophrenia disorder
Methods: A two-step algorithm was employed to apply a combination of two variable selection methods, recursive feature elimination (RFE) and genetic algorithm feature selection (GAFS), and four modeling techniques: Gradient boosting (GB), artificial neural network (ANN), random forest (RF), and support vector machine (SVM) to simulated recurrent event datasets.
Results: In most simulation scenarios, the results indicated that the combination of RFE and RF applied to the deviance residual (DR) outperforms the other methods. The RFE-RF-DR selected the following predictor variables: Number of children, age, marital status, and history of substance abuse.
Conclusion: Our findings revealed that the proposed machine learning-based model is a promising technique for selecting predictor variables associated with a recurrent outcome when analyzing multivariate time-toevent data with recurrent events.