Multiclass Response Feature Selection and Cancer Tumour Classification With Support Vector Machine

  • A. W. Banjoko
  • W. B. Yahya
  • M. K. Garba
Keywords: Support Vector Machines; Monte-Carlo Cross- Validation; F-Statistic, Family wise error rate, Misclassification Error Rate.

Abstract

Background & Aim: In this study, efficient Support Vector Machine (SVM) algorithm for feature selection and classification of multi-category tumour classes of biological samples using gene expression profiles was proposed.

Methods: Feature selection interface of the algorithm employed the F-statistic of the ANOVA–like testing scheme at some chosen family-wise-error-rate which ensured efficient detection of false-positive genes. The selected gene subsets using the above method were further screened for optimality using the Misclassification Error Rates yielded by each of them and their combinations in a sequential selection manner. In a 10-fold cross-validation, the optimal values of the SVM parameters with appropriate kernel were determined  for  tissue sample classification using one-versus-all approach. The entire data matrix was randomly partitioned into 95% training set to train the SVM classifier and 5% test set to evaluate the predictive performance of the classifier over 1,000 Monte-Carlo cross-validation runs. Published microarray breast cancer dataset with five clinical endpoints was employed to validate the results from the simulation studies.

Results: Results from Monte-Carlo study showed excellent performance of the SVM classifier with higher prediction accuracy of the tissue samples based on the few gene biomarkers selected by the proposed feature selection method.

Conclusion: SVM could be considered as a classification of multi-category tumour classes of biological



Published
2020-02-03
Section
Articles