Sparse Variable Selection in Competing Risks Additive Hazards Regression: An Application for Identifying Biomarkers Related to Prognosis of Bladder Cancer

  • Leili Tapak Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran.
  • Michael R. Kosorok Department of Biostatistics, Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, USA.
  • Omid Hamidi Department of Science, Hamedan University of Technology, Hamedan, Iran.
  • Mahya Arayeshgari Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
Keywords: Competing risks; Subdistributions; Microarray; Additive hazards model; Variable selection; LASSO

Abstract

Introduction: Variable selection is increasingly becoming a key step in biomedical research, particularly in high-throughput genomic data analysis. One major focus is selecting relevant gene expression profiles associated with time-to-event outcomes, such as death. A significant challenge in this context is competing risks, where identifying a small subset of gene expression profiles related to the cumulative incidence function (CIF) is essential.

Methods: Several methods have been proposed for directly modeling CIF, primarily by modeling the subdistribution hazard function for the event of interest. We proposed a regularized method for variable selection in the additive subdistribution hazards model by integrating five penalized likelihood approaches—Least Absolute Shrinkage and Selection Operator (LASSO), Adaptive LASSO (ALASSO), Elastic Net (ENET), Adaptive Elastic Net (AENET), and Smoothly Clipped Absolute Deviation (SCAD)—with the pseudoscore method. We conducted Monte Carlo simulations to evaluate the performance of our proposed method. Furthermore, the method was applied to a publicly available dataset of 301 patients diagnosed with non-muscle-invasive bladder carcinoma from five countries between 1987 and 2000.

Results: Our proposed method was evaluated through simulation studies and applied to genomic data, focusing on progression-free survival as the endpoint and identifying the genes associated with the CIF of bladder cancer in the presence of competing events. Five genes, namely DCTD, IGF1R, NCF2, PLEK, and CDC20, were consistently identified across different penalties. Notably, the overexpression of DCTD and IGF1R was associated with a decreased cumulative incidence of bladder cancer progression or death. In contrast, the overexpression of NCF2, PLEK, and CDC20 correlated with an increased cumulative incidence of these events.

Conclusion: According to the findings of the simulation studies, all penalties yielded comparable results in terms of sensitivity and specificity. However, the AENET and ALASSO ppenalties demonstrated superior estimation accuracy.

Published
2025-08-01
Section
Articles