Comparative studies for developing protein based cancer prediction model to maximise the ROC-AUC with various variable selection methods

The era of protein data analysis is coming with more accurate quantification experiments such as the multiple reaction monitoring MRM. Protein is easier to obtain than the other genetic variants or gene expression data, which makes it more suitable for early diagnosis of cancer. Each patient has unique patterns of protein data, which makes it imperative for the researcher to select the effective markers to construct a consistent model to predict the patients. This research focuses on finding the most effective variable selection method to be applied in the early diagnosis of the pancreatic cancer. In the process, we compare classical selection methods stepwise selection based on AIC, BIC, machine learning based selection method support vector machine recursive feature selection; SVM-REF, and stepwise selection method using the area under the receiver operating characteristic curve Step-AUC. Based on the simulation and real data analysis, we suggest a Step-AUC method to maximise the prediction performance of the early diagnosis by protein data.