CS 229 Project : A Machine Learning Approach to Stroke Risk Prediction

In this paper, we consider the prediction of stroke using the Cardiovascular Health Study (CHS) dataset. Missing data imputation, feature selection and feature aggregation were conducted before training and making prediction with the dataset. Then we used a mixture of support vector machine (SVM) and stratified Cox proportional hazards model to predict the occurrence of stroke. Different methods and variations of models were evaluated and compared to find the best algorithm.