Diagnosis and biomarker identification on SELDI proteomics data by ADTBoost

Clinical proteomics is an emerging field that will have great impact on molecular diagnosis, identification of disease biomarkers, drug discovery and clinical trials in the post-genomic era. Protein profiling in tissues and fluids in disease and pathological control and other proteomics techniques will play an important role in molecular diagnosis with therapeutics and personalized healthcare. We introduced a new robust diagnostic method based on ADTboost algorithm, a novel method in proteomics data analysis to improve classification accuracy. It generates classification rules, which are often smaller and easier to interpret. This method often gives most discriminative features, which can be utilized as biomarkers for diagnostic purpose. Also, it has a nice feature of providing a measure of prediction confidence. We carried out this method in Amyotrophic lateral sclerosis disease data acquired by surface enhanced laser desorption/ionization-time-of-flight mass spectrometry experiments. Our method is shown to have outstanding prediction capacity through the cross-validation, ROC analysis results and comparative study. Our molecular diagnosis method provides an efficient way to distinguish ALS disease from neurological controls. The results are expressed in a simple and straightforward alternating decision tree format or conditional format. We identified most discriminative peaks in proteomic data, which can be utilized as biomarkers for diagnosis. ADTboost is not only useful in on proteomic data classification, it can also integrate other clinical, imaging data from heterogeneous sources for early diagnosis. It will have broad application in molecular diagnosis through proteomics and personalized medicine.