Outlier Analysis and Top Scoring Pair for Integrated Data Analysis and Biomarker Discovery

Pathway deregulation has been identified as a key driver of carcinogenesis, with proteins in signaling pathways serving as primary targets for drug development. Deregulation can be driven by a number of molecular events, including gene mutation, epigenetic changes in gene promoters, overexpression, and gene amplifications or deletions. We demonstrate a novel approach that identifies pathways of interest by integrating outlier analysis within and across molecular data types with gene set analysis. We use the results to seed the top-scoring pair algorithm to identify robust biomarkers associated with pathway deregulation. We demonstrate this methodology on pediatric acute myeloid leukemia (AML) data. We develop a biomarker in primary AML tumors, demonstrate robustness with an independent primary tumor data set, and show that the identified biomarkers also function well in relapsed pediatric AML tumors.

[1]  Michael A McDevitt,et al.  Acute myeloid leukemia is characterized by Wnt pathway inhibitor promoter hypermethylation , 2010, Leukemia & lymphoma.

[2]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[3]  Xing-Ming Zhao,et al.  Identifying dysregulated pathways in cancers from pathway interaction networks , 2012, BMC Bioinformatics.

[4]  Daniel Q. Naiman,et al.  Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2004, Statistical applications in genetics and molecular biology.

[5]  L. Hood,et al.  Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas , 2007, Proceedings of the National Academy of Sciences.

[6]  Tapio Visakorpi,et al.  Androgen receptor (AR) aberrations in castration-resistant prostate cancer , 2012, Molecular and Cellular Endocrinology.

[7]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[8]  Seishi Ogawa,et al.  Splicing factor mutations in myelodysplasia , 2012, International Journal of Hematology.

[9]  Rafael A. Irizarry,et al.  A framework for oligonucleotide microarray preprocessing , 2010, Bioinform..

[10]  M. Roizen,et al.  Hallmarks of Cancer: The Next Generation , 2012 .

[11]  E. Estey,et al.  Acute myeloid leukemia: 2013 update on risk‐stratification and management , 2013, American journal of hematology.

[12]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[13]  Michael F. Ochs,et al.  Implications of Systemic Dysfunction for the Etiology of Malignancy , 2013, Gene regulation and systems biology.

[14]  Aleksandar Sekulic,et al.  Advanced basal cell carcinoma of the skin: targeting the hedgehog pathway , 2013, Current Opinion in Oncology.

[15]  Yu Liu,et al.  Gene interaction enrichment and network analysis to identify dysregulated pathways and their interactions in complex diseases , 2012, BMC Systems Biology.

[16]  A. Godwin,et al.  Detection of treatment-induced changes in signaling pathways in gastrointestinal stromal tumors using transcriptomic data. , 2009, Cancer research.

[17]  Debashis Ghosh,et al.  COPA - cancer outlier profile analysis , 2006, Bioinform..

[18]  David Haussler,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[19]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[20]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[21]  J. Uhm An Integrated Genomic Analysis of Human Glioblastoma Multiforme , 2009 .

[22]  Richard M. Karp,et al.  DEGAS: De Novo Discovery of Dysregulated Pathways in Human Diseases , 2010, PloS one.

[23]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[24]  Teresa M. Przytycka,et al.  Identifying Causal Genes and Dysregulated Pathways in Complex Diseases , 2011, PLoS Comput. Biol..

[25]  V. Govorun,et al.  Genome-scale analysis of DNA methylation in colorectal cancer using Infinium HumanMethylation450 BeadChips , 2013, Epigenetics.

[26]  Daniel Q. Naiman,et al.  Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data , 2005, Bioinform..

[27]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[28]  R. Tibshirani,et al.  Outlier sums for differential gene expression analysis. , 2007, Biostatistics.

[29]  Matthew E Ritchie,et al.  Using the R Package crlmm for Genotyping and Copy Number Estimation. , 2011, Journal of statistical software.

[30]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[31]  R. Prayson,et al.  Mutational Heterogeneity in Human Cancers : Origin and Consequences , 2010 .

[32]  Debashis Ghosh,et al.  Discrete Nonparametric Algorithms for Outlier Detection with Genomic Data , 2010, Journal of biopharmaceutical statistics.