A Comparative Study of Breast Cancer Detection and Recurrence Prediction Using CatBoost Classifier
DOI:
https://doi.org/10.14311/AP.2025.65.0136Keywords:
CatBoost classifier, breast cancer, machine learning, bioinformaticAbstract
In 2019, breast cancer accounted for over one-third of all cancer cases in women in Iraq. It affects both men and women, though it is more common in women. This study delves into advanced machine learning techniques – CatBoost, XGBoost, Random Forest, SVM, KNN, and Naive Bayes – to improve the detection and prediction of breast cancer recurrence after healing. The goal is to evaluate models using key metrics (sensitivity, specificity, precision, F1 score, accuracy, ROC, and AUC score). Among all algorithms examined, CatBoost stood out, showcasing AUC values above 98 %, 90 %, and 83% on different datasets. This research demonstrates how machine learning techniques can significantly improve the accuracy of breast cancer detection and recurrence prediction, steering healthcare providers towards better patient care outcomes and more effective treatment plans.
Downloads
References
M. M. Y. Al-Hashimi. Trends in breast cancer incidence in Iraq during the period 2000–2019. Asian Pacific Journal of Cancer Prevention 22(12):3889–3896, 2021. https://doi.org/10.31557/APJCP.2021.22.12.3889
I. J. Mustafa, O. R. Abdullah, N. Al-Saffar, et al. Quality of life assessment in women with breast cancer in Nineveh, Iraq. Cureus 16(1):e51589, 2024. https://doi.org/10.7759/cureus.51589
E. Deniz, A. Şengür, Z. Kadiroğlu, et al. Transfer learning based histopathologic image classification for breast cancer detection. Health Information Science and Systems 6(1):18, 2018. https://doi.org/10.1007/s13755-018-0057-x
A. Yala, C. Lehman, T. Schuster, et al. A deep learning mammography-based model for improved breast cancer risk prediction. Radiology 292(1):60–66, 2019. https://doi.org/10.1148/radiol.2019182716
S. M. McKinney, M. Sieniek, V. Godbole, et al. International evaluation of an AI system for breast cancer screening. Nature 577(7788):89–94, 2020. https://doi.org/10.1038/s41586-019-1799-6
J. Kim, H. J. Kim, C. Kim, et al. Weakly-supervised deep learning for ultrasound diagnosis of breast cancer. Scientific Reports 11(1):24382, 2021. https://doi.org/10.1038/s41598-021-03806-7
G. Meenalochini, S. Ramkumar. Survey of machine learning algorithms for breast cancer detection using mammogram images. Materials Today: Proceedings 37:2738–2743, 2021. https://doi.org/10.1016/j.matpr.2020.08.543
S. Joo, E. S. Ko, S. Kwon, et al. Multimodal deep learning models for the prediction of pathologic response to neoadjuvant chemotherapy in breast cancer. Scientific Reports 11(1):18800, 2021. https://doi.org/10.1038/s41598-021-98408-8
Y. Zhang, J.-H. Chen, Y. Lin, et al. Prediction of breast cancer molecular subtypes on DCE-MRI using convolutional neural network with transfer learning between two centers. European Radiology 31(4):2559–2567, 2021. https://doi.org/10.1007/s00330-020-07274-x
J. Li, Z. Zhou, J. Dong, et al. Predicting breast cancer 5-year survival using machine learning: A systematic review. PLoS One 16(4):e0250370, 2021. https://doi.org/10.1371/journal.pone.0250370
A. A. Mahmood, S. Sadeq, Y. I. Aljanabi, A. H. Sabry. Developing a convolutional neural network for classifying tumor images using Inception V3. Eastern-European Journal of Enterprise Technologies 3(9 (123)):86–93, 2023. https://doi.org/10.15587/1729-4061.2023.281227
W. Wolberg, O. Mangasarian, N. Street, W. Street. Breast cancer Wisconsin (diagnostic), 1993. https://doi.org/10.24432/C5DW2B
C. Clarke, S. F. Madden, P. Doolan, et al. Correlating transcriptional networks to breast cancer survival: A large-scale coexpression analysis. Carcinogenesis 34(10):2300–2308, 2013. https://doi.org/10.1093/carcin/bgt208
L. Lin. Properties and applications of biharmonic and K-harmonic distances in clustering. Bachelor’s thesis, Oregon State University, Corvallis, Oregon, USA, 2024.
A. V. Dorogush, V. Ershov, A. Gulin. CatBoost: gradient boosting with categorical features support. arXiv preprint 2018. https://doi.org/10.48550/arXiv.1810.11363
L. Prokhorenkova, G. Gusev, A. Vorobev, et al. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems, vol. 31, pp. 1–11. 2018.
S. Zhang, J. Li. KNN classification with one-step computation. IEEE Transactions on Knowledge and Data Engineering 35(3):2711–2723, 2021. https://doi.org/10.1109/TKDE.2021.3119140
R. D. Abdu-Aljabar, O. A. Awad. Improving lung cancer relapse prediction using the developed Optuna_XGB classification model. International Journal of Intelligent Engineering and Systems 16(1):131–141, 2023. https://doi.org/10.22266/ijies2023.0228.12
M. Bader-El-Den, E. Teitei, T. Perry. Biased random forest for dealing with the class imbalance problem. IEEE Transactions on Neural Networks and Learning Systems 30(7):2163–2172, 2019. https://doi.org/10.1109/TNNLS.2018.2878400
F.-J. Yang. An implementation of Naive Bayes classifier. In 2018 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 301–306. 2018. https://doi.org/10.1109/CSCI46756.2018.00065
C.-C. Chang, C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3):27, 2011. https://doi.org/10.1145/1961189.1961199
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Rana Dhia’a Abdu-aljabar, Khansaa Dheya Aljafaar, Zinah Jaffar Mohammed Ameen, Hala A. Naman

This work is licensed under a Creative Commons Attribution 4.0 International License.


