In Silico Identification of Effective Genes for Acute Leukemia Classification Using a Spline Regression-based Framework
Abstract
Background: Microarray technology enables the examination of gene expression in thousands of genes and can be highly effective in identifying various types of cancers, including leukemia. However, many genes in microarray data are redundant and lack useful information for cancer diagnosis. The main objective of this study is to identify relevant and effective genes in classification of leukemia microarray data using a spline regression-based method, taking into account the correlation between genes.
Materials and Methods: In this analytical study, leukemia microarray data are used to identify relevant genes in classification of leukemia into Acute Myeloid Leukemia (AML) and Acute Lymphoblastic Leukemia (ALL) using a spline regression-based gene selection method, called SRS3FS based on ℓ2,p-norm (0 < p ≤ 1). Subsequently, the support vector machine (SVM) algorithm is employed to classify leukemia data into AML and ALL.
Results: In this study, the classification results of SVM algorithm for 5, 10, 15, and 20 genes reveal that the SRS3FS method, employing ℓ2,1/4-norm, ℓ2,1/2-norm and ℓ2,3/4-norm, exhibited the highest accuracy of 97.06% when identifying 10 genes for distinguishing between AML and ALL. Moreover, the leukemia data was classified into AML and ALL with an accuracy of 100%, using a gene identified by the SRS3FS method based on ℓ2,3/4-norm and ℓ2,1-norm. The gene labeled as number 3252, annotated as GLUTATHIONE S-TRANSFERASE, MICROSOMAL, is recognized as the most important gene.
Conclusion: The experimental results on leukemia microarray data demonstrate that the spline regression-based gene selection method can effectively identify relevant genes in classification and prediction of leukemia.