AUREA: an open-source software system for accurate and user-friendly identification of relative expression molecular signatures

BackgroundPublic databases such as the NCBI Gene Expression Omnibus contain extensive and exponentially increasing amounts of high-throughput data that can be applied to molecular phenotype characterization. Collectively, these data can be analyzed for such purposes as disease diagnosis or phenotype classification. One family of algorithms that has proven useful for disease classification is based on relative expression analysis and includes the Top-Scoring Pair (TSP), k-Top-Scoring Pairs (k-TSP), Top-Scoring Triplet (TST) and Differential Rank Conservation (DIRAC) algorithms. These relative expression analysis algorithms hold significant advantages for identifying interpretable molecular signatures for disease classification, and have been implemented previously on a variety of computational platforms with varying degrees of usability. To increase the user-base and maximize the utility of these methods, we developed the program AUREA (Adaptive Unified Relative Expression Analyzer)—a cross-platform tool that has a consistent application programming interface (API), an easy-to-use graphical user interface (GUI), fast running times and automated parameter discovery.ResultsHerein, we describe AUREA, an efficient, cohesive, and user-friendly open-source software system that comprises a suite of methods for relative expression analysis. AUREA incorporates existing methods, while extending their capabilities and bringing uniformity to their interfaces. We demonstrate that combining these algorithms and adaptively tuning parameters on the training sets makes these algorithms more consistent in their performance and demonstrate the effectiveness of our adaptive parameter tuner by comparing accuracy across diverse datasets.ConclusionsWe have integrated several relative expression analysis algorithms and provided a unified interface for their implementation while making data acquisition, parameter fixing, data merging, and results analysis ‘point-and-click’ simple. The unified interface and the adaptive parameter tuning of AUREA provide an effective framework in which to investigate the massive amounts of publically available data by both ‘in silico’ and ‘bench’ scientists. AUREA can be found at http://price.systemsbiology.net/AUREA/.

[1]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[2]  Benjamin F. Cravatt,et al.  Activity-based protein profiling for biochemical pathway discovery in cancer , 2010, Nature Reviews Cancer.

[3]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Jaeyun Sung,et al.  Relative Expression Analysis for Molecular Cancer Diagnosis and Prognosis , 2010, Technology in cancer research & treatment.

[5]  Daniel Q. Naiman,et al.  The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations , 2009, BMC Bioinformatics.

[6]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[7]  Nathan D. Price,et al.  Graphics processing unit implementations of relative expression analysis algorithms enable dramatic computational speedup , 2011, Bioinform..

[8]  Bob Löwenberg,et al.  A 2-gene classifier for predicting response to the farnesyltransferase inhibitor tipifarnib in acute myeloid leukemia. , 2007, Blood.

[9]  L. Hood,et al.  Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas , 2007, Proceedings of the National Academy of Sciences.

[10]  Ron Edgar,et al.  Gene Expression Omnibus ( GEO ) : Microarray data storage , submission , retrieval , and analysis , 2008 .

[11]  Nathan D. Price,et al.  The top-scoring ‘N’ algorithm: a generalized relative expression classification method from small numbers of biomolecules , 2012, BMC Bioinformatics.

[12]  Daniel Q. Naiman,et al.  Statistical Applications in Genetics and Molecular Biology Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2011 .

[13]  Jeffrey T. Leek,et al.  The tspair package for finding top scoring pair classifiers in R , 2009, Bioinform..

[14]  Donald Geman,et al.  Identifying Tightly Regulated and Variably Expressed Networks by Differential Rank Conservation (DIRAC) , 2010, PLoS Comput. Biol..