Prediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods

  • Sanaz Mehrabani Non-Communicable Pediatric Diseases Research Center, Health Research Institute, Babol University of Medical Sciences, Babol, Iran
  • Morteza Zangeneh Soroush Department of Biomedical Engineering, Science and Research branch, Islamic Azad University, Tehran, Iran
  • Negin Kheiri Shiraz University of Medical Sciences, Shiraz, Iran
  • Razieh Sheikhpour Department of Computer Engineering, Faculty of Engineering, Ardakan University, P.O. Box 184, Ardakan, Iran
  • Mahshid Bahrami Department of Radiology, Isfahan University of Medical Sciences, Isfahan, Iran
Keywords: Gene expression data, Gene Selection, Acute myeloid leukemia, Acute lymphoblastic leukemia

Abstract

Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method.

Materials and Methods: In this descriptive study, the microarray gene expression data of 72 patients with acute myeloid leukemia (AML) and lymphoblastic leukemia (ALL) was used. To remove the redundant genes and identify the most important genes in the prediction of AML and ALL, a robust 2,p-norm (0 < p ≤1) sparsity-based gene selection method was applied, in which the parameter p method was implemented from 1/4, 1/2, 3/4 and 1. Then, the most important genes were used by the random forest (RF) and support vector machine (SVM) classifiers for prediction of AML and ALL.

Results: The RF and SVM classifiers correctly classified all AML and ALL samples. The RF classifier obtained the performance of 100% using 10 genes selected by the 2,1/2-norm and 2,1-norm sparsity-based gene selection methods. Moreover, the SVM classifier obtained a performance of 100% using 10 genes selected by the 2,1/2-norm method. Seven common genes were identified by all four values of parameter p in the 2,p-norm method as the most important genes in the classification of AML and ALL, and the gene with the description “PRTN3 Proteinase 3 (serine proteinase, neutrophil, Wegener granulomatosis autoantigen” was identified as the most important gene.

Conclusion: The results obtained in this study indicated that the prediction of blood cancer from leukemia microarray gene expression data can be carried out using the robust ℓ2,p-norm sparsity-based gene selection method and classification algorithms. It can be useful to examine the expression level of the genes identified by this study to predict leukemia.

Published
2023-01-03
Section
Articles