In Silico Identification of Effective Genes for Acute Leukemia Classification Using a Spline Regression-based Framework

  • Maryam Yazdanparast Department of Pediatrics, Shahid Sadoughi Hospital, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
  • Razieh Sheikhpour Department of Computer Engineering, Faculty of Engineering, Ardakan University, P.O. Box 184, Ardakan, Iran
  • Morteza Zangeneh Soroush Department of Biomedical Engineering, Science and Research branch, Islamic Azad University, Tehran, Iran
  • Fatemeh Ghanizadeh Hematology and Oncology Research Center, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
Keywords: Acute lymphocytic leukemia, Acute myeloid leukemia, Gene expression, Sparse gene selection, Spline regression

Abstract

Background: Microarray technology enables the examination of gene expression in thousands of genes and can be highly effective in identifying various types of cancers, including leukemia. However, many genes in microarray data are redundant and lack useful information for cancer diagnosis. The main objective of this study is to identify relevant and effective genes in classification of leukemia microarray data using a spline regression-based method, taking into account the correlation between genes.

Materials and Methods: In this analytical study, leukemia microarray data are used to identify relevant genes in classification of leukemia into Acute Myeloid Leukemia (AML) and Acute Lymphoblastic Leukemia (ALL) using a spline regression-based gene selection method, called SRS3FS based on 2,p-norm (0 < p ≤ 1). Subsequently, the support vector machine (SVM) algorithm is employed to classify leukemia data into AML and ALL.

Results: In this study, the classification results of SVM algorithm for 5, 10, 15, and 20 genes reveal that the SRS3FS method, employing 2,1/4-norm, 2,1/2-norm and 2,3/4-norm, exhibited the highest accuracy of 97.06% when identifying 10 genes for distinguishing between AML and ALL. Moreover, the leukemia data was classified into AML and ALL with an accuracy of 100%, using a gene identified by the SRS3FS method based on 2,3/4-norm and 2,1-norm. The gene labeled as number 3252, annotated as GLUTATHIONE S-TRANSFERASE, MICROSOMAL, is recognized as the most important gene.

Conclusion: The experimental results on leukemia microarray data demonstrate that the spline regression-based gene selection method can effectively identify relevant genes in classification and prediction of leukemia.

Published
2024-04-04
Section
Articles