MACHINE LEARNING APPROACHES FOR HEALTHCARE DATA ANALYSIS

Breast cancer is the most common cancer in women worldwide and it remains the most common cause of cancer-related death in woman globally. Machine Learning techniques have been proven to be of great help in prognosis and diagnosis of various health related issues. This work constitutes a comparison of five machine learning (ML) algorithms: Logistic Regression (LR), K-Nearest Neighbor (KNN), Naive-Bayes (NB), Decision Tree (DT), Random Forest (RF) on the Breast Cancer Wisconsin Diagnostic (BCWD) dataset. Features were extracted from the digitized images of FNA tests on a breast mass. Results show that Random Forests performs better among all the models across different classification metrics such as accuracy, precision, recall, and f1-score.