Using Data Mining Techniques for Early Diagnosis of Breast Cancer
Abstract
Aim: The present study aimed to compare six data mining approaches and find the best methods for predicting breast cancer.
Method: In this study, six classification methods, including Random Forest (RF), Neural Network (NN), Support Vector Machine (SVM), Auto Multilayer Perceptron (AutoMLP), Naïve Bayes (NB), and Deep Learning (DL) were applied for breast cancer detection. Data related to 116 patients and healthy people from the UCI repository with nine predictors were used for training and testing. To develop the model, data were first divided into two parts: training and testing. The data of the training set (70%) produced the models, and the data of the test set (30%) was applied to validate the models.
Results: To compare the performance of the techniques used to diagnose breast cancer, accuracy, recall, precision, AUC (Area Under the ROC Curve), sensitivity, and specificity were calculated and reported for all approaches. Evaluation of data mining algorithms revealed that deep learning with 81.89% accuracy performed better than other techniques. The results of one-way ANOVA for performance in six modeling methods showed no statistically significant difference between the methods (P-value <0.05).
Conclusion: Choosing the most effective computer diagnostic methods can provide a comprehensive system for the early detection of breast cancer. By reducing the cost of treating patients and increasing the quality of services offered, these intelligent methods take practical steps to improve medicine and lead to a systematic diagnosis.