Abstract 5085: TopModel: An online resource for predictive models in cancer

One goal of characterizing the genome-wide landscape of cancer cells is to identify predictive signatures of onset, progression, and treatment outcomes. Many computational approaches have been developed to discover gene signatures with a range of success. The challenge still remains to identify the best approach that, when trained on one cohort, remains accurate in predicting outcomes on an unseen cohort. Thus far, no clear themes have emerged that might provide clues about which method works for a particular task. We have built a system called TopModel that facilitates the identification of top-performing machine-learning algorithms for a series of cancer-genomics challenges. The four components of the system include: 1) a benchmark that includes several cancer genomics datasets with outcome variables as targets to predict; 2) a database of results derived from the application of thousands of machine-learning and feature selection combinations; 3) a web interface that allows bioinformaticians to evaluate their own prediction results; and 4) a web interface that allows a biomedical researcher to upload data on a sample or set of samples in order to receive a report on the signatures predicted to exist in the sample(s). The cancer benchmark component provides a common ground for the development and evaluation of prediction methods for variables such as cancer subtype, drug response, survival, and others. Several datasets have been loaded including predicting survival in the TCGA cohorts, and the hundreds of drug sensitivities in several cancer cell line cohorts. We demonstrate the utility of the resource by comparing state-of-the art feature selection methods to a new approach that uses locality on a genetic interaction network. We evaluate the performance in terms of how well the features generalize across datasets as a trade-off to the accuracy of prediction. In addition to identifying high-value genome features, we explore the robustness of the cancer state in the absence of these features. We simulate gene knock outs by disconnecting these features in our pathway models, inferring the pathway interaction network in the absence of these features, and then reassessing using the top-performing predictive models of cancer. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr 5085. doi:1538-7445.AM2012-5085