Integration of Ranked Lists via Cross Entropy Monte Carlo with Applications to mRNA and microRNA Studies

One of the major challenges facing researchers studying complex biological systems is integration of data from -omics platforms. Omic-scale data include DNA variations, transcriptom profiles, and RAomics. Selection of an appropriate approach for a data-integration task is problem dependent, primarily dictated by the information contained in the data. In situations where modeling of multiple raw datasets jointly might be extremely challenging due to their vast differences, rankings from each dataset would provide a commonality based on which results could be integrated. Aggregation of microRNA targets predicted from different computational algorithms is such a problem. Integration of results from multiple mRNA studies based on different platforms is another example that will be discussed. Formulating the problem of integrating ranked lists as minimizing an objective criterion, we explore the usage of a cross entropy Monte Carlo method for solving such a combinatorial problem. Instead of placing a discrete uniform distribution on all the potential solutions, an iterative importance sampling technique is utilized "to slowly tighten the net" to place most distributional mass on the optimal solution and its neighbors. Extensive simulation studies were performed to assess the performance of the method. With satisfactory simulation results, the method was applied to the microRNA and mRNA problems to illustrate its utility.

[1]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[2]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[3]  T. Barrette,et al.  α-Methylacyl-CoA Racemase: Expression Levels of this Novel Cancer Biomarker Depend on Tumor Differentiation , 2002 .

[4]  M. Bittner,et al.  Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. , 2001, Cancer research.

[5]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[6]  L. Margolin,et al.  On the Convergence of the Cross-Entropy Method , 2005, Ann. Oper. Res..

[7]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[8]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[9]  S. Sealfon,et al.  Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays. , 2002, Nucleic acids research.

[10]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[11]  Eytan Ruppin,et al.  Meta-analysis of gene expression data: a predictor-based approach , 2007, Bioinform..

[12]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[13]  Leroy Hood,et al.  A molecular correlate to the Gleason grading system for prostate adenocarcinoma. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Ming Tan,et al.  Genome-Wide Tagging SNPs with Entropy-Based Monte Carlo Method , 2006, J. Comput. Biol..

[15]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Anton J. Enright,et al.  Human MicroRNA Targets , 2004, PLoS biology.

[17]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[18]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[19]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[20]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[21]  Daniel Q. Naiman,et al.  Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data , 2005, Bioinform..

[22]  Ning Sun,et al.  Bayesian error analysis model for reconstructing transcriptional regulatory networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[23]  L. McIntyre,et al.  Combining mapping and arraying: An approach to candidate gene identification , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[25]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[26]  Giovanni Parmigiani,et al.  A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer , 2004, Clinical Cancer Research.

[27]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[28]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[29]  S. Falcon,et al.  Combining Results of Microarray Experiments: A Rank Aggregation Approach , 2006, Statistical applications in genetics and molecular biology.

[30]  A. Hatzigeorgiou,et al.  A combined computational-experimental approach predicts human microRNA targets. , 2004, Genes & development.

[31]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[32]  O. Klezovitch,et al.  Hepsin promotes prostate cancer progression and metastasis. , 2004, Cancer cell.

[33]  Anton J. Enright,et al.  MicroRNA targets in Drosophila , 2003, Genome Biology.

[34]  K. Gunsalus,et al.  Combinatorial microRNA target predictions , 2005, Nature Genetics.

[35]  Kevin R. Coombes,et al.  Differences in gene expression between B-cell chronic lymphocytic leukemia and normal B cells: a meta-analysis of three microarray studies , 2004, Bioinform..

[36]  Vasyl Pihur,et al.  Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach , 2007, Bioinform..

[37]  J. Nelson,et al.  Increased fatty acid synthase as a therapeutic target in androgen‐independent prostate cancer progression , 2001, The Prostate.

[38]  Ruth Etzioni,et al.  Combining Results of Microarray Experiments: A Rank Aggregation Approach , 2006 .

[39]  C. Burge,et al.  Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets , 2005, Cell.