dc.description.abstract | Breast cancer is the leading type of cancer among women worldwide, with about 2 million new cases
and 627,000 deaths every year. The breast tumors can be malignant or benign. Medical screening can be used to
detect the type of a diagnosed tumor. Alternatively, predictive modelling can also be used to predict whether a tumor
is malignant or benign. However, the accuracy of the prediction algorithms is important since any incidence of false
negatives may have dire consequence since a person cannot be put under medication, which can lead to death.
Moreover, cases of false positives may subject an individual to unnecessary stress and medication. Therefore, this
study sought to develop and validate a new predictive model based on binary logistic, support vector machine and
extreme gradient boosting models in order to improve the prediction accuracy of the cancer tumors. This study used
the Breast Cancer Wilcosin data set available on Kaggle. The dependent variable was whether a tumor is malignant
or benign. The regressors were the tumor features such as radius, texture, area, perimeter, smoothness, compactness,
concavity, concave points, symmetry and fractional dimension of the tumor. Data analysis was done using the Rstatistical software and it involved, generation of descriptive statistics, data reduction, feature selection and model
fitting. Before model fitting was done, the reduced data was split into the train set and the validation set. The results
showed that the binary logistic, support vector machine and extreme gradient boosting models had predictive
accuracies of 96.97%, 98.01% and 97.73%. This showed an improvement compared to already existing models. The
results of this study showed that support vector machine and extreme gradient boosting have better prediction power
for cancer tumors compared to binary logistic. This study recommends the use of support vector machine and
extreme gradient boosting in cancer tumor prediction and also recommends further investigations for other
algorithms that can improve prediction | en_US |