MiningABs: mining associated biomarkers across multi-connected gene expression datasets

BackgroundHuman disease often arises as a consequence of alterations in a set of associated genes rather than alterations to a set of unassociated individual genes. Most previous microarray-based meta-analyses identified disease-associated genes or biomarkers independent of genetic interactions. Therefore, in this study, we present the first meta-analysis method capable of taking gene combination effects into account to efficiently identify associated biomarkers (ABs) across different microarray platforms.ResultsWe propose a new meta-analysis approach called MiningABs to mine ABs across different array-based datasets. The similarity between paired probe sequences is quantified as a bridge to connect these datasets together. The ABs can be subsequently identified from an “improved” common logit model (c-LM) by combining several sibling-like LMs in a heuristic genetic algorithm selection process. Our approach is evaluated with two sets of gene expression datasets: i) 4 esophageal squamous cell carcinoma and ii) 3 hepatocellular carcinoma datasets. Based on an unbiased reciprocal test, we demonstrate that each gene in a group of ABs is required to maintain high cancer sample classification accuracy, and we observe that ABs are not limited to genes common to all platforms. Investigating the ABs using Gene Ontology (GO) enrichment, literature survey, and network analyses indicated that our ABs are not only strongly related to cancer development but also highly connected in a diverse network of biological interactions.ConclusionsThe proposed meta-analysis method called MiningABs is able to efficiently identify ABs from different independently performed array-based datasets, and we show its validity in cancer biology via GO enrichment, literature survey and network analyses. We postulate that the ABs may facilitate novel target and drug discovery, leading to improved clinical treatment. Java source code, tutorial, example and related materials are available at “http://sourceforge.net/projects/miningabs/”.

[1]  Lucila Ohno-Machado,et al.  A genetic algorithm to select variables in logistic regression: example in the domain of myocardial infarction , 1999, AMIA.

[2]  Vincent S. Tseng,et al.  Mining differential top-k co-expression patterns from time course comparative gene expression datasets , 2013, BMC Bioinformatics.

[3]  I Olkin,et al.  Approximations for trimmed Fisher procedures in research synthesis , 2001, Statistical methods in medical research.

[4]  Suyan Tian,et al.  Meta-Analysis Derived (MAD) Transcriptome of Psoriasis Defines the “Core” Pathogenesis of Disease , 2012, PloS one.

[5]  Wei Wang,et al.  Extracellular matrix protein 1, a novel prognostic factor, is associated with metastatic potential of hepatocellular carcinoma , 2011, Medical oncology.

[6]  Korbinian Strimmer,et al.  PAL: an object-oriented programming library for molecular evolution and phylogenetics , 2001, Bioinform..

[7]  B. De Moor,et al.  Comparison and meta-analysis of microarray data: from the bench to the computer desk. , 2003, Trends in genetics : TIG.

[8]  Yi-Lin Tsai,et al.  An efficient method for mining cross-timepoint gene regulation sequential patterns from time course gene expression datasets , 2013, BMC Bioinformatics.

[9]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[10]  E W Steyerberg,et al.  Preoperative chemoradiotherapy for esophageal or junctional cancer. , 2012, The New England journal of medicine.

[11]  Edward S. Buckler,et al.  A brief introduction to the Phylogenetic Analysis Library version 1.5 , 2004 .

[12]  BMC Bioinformatics , 2005 .

[13]  Stephanie Roessler,et al.  Integrative genomic identification of genes on 8p associated with hepatocellular carcinoma progression and patient survival. , 2012, Gastroenterology.

[14]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[15]  L. Marks,et al.  A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes. , 2008, Medical physics.

[16]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[17]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[18]  Kevin R. Coombes,et al.  Differences in gene expression between B-cell chronic lymphocytic leukemia and normal B cells: a meta-analysis of three microarray studies , 2004, Bioinform..

[19]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[20]  W. Marston Linehan,et al.  Molecular Sub-Classification of Renal Epithelial Tumors Using Meta-Analysis of Gene Expression Microarrays , 2011, PloS one.

[21]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[22]  George C. Tseng,et al.  Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline , 2013, BMC Bioinformatics.

[23]  Nan Hu,et al.  Gene expression analysis of esophageal squamous cell carcinoma reveals consistent molecular profiles related to a family history of upper gastrointestinal cancer. , 2003, Cancer research.

[24]  Anya Tsalenko,et al.  Antibody Arrays Identify Potential Diagnostic Markers of Hepatocellular Carcinoma , 2008, Biomarker insights.

[25]  Yi-Lin Tsai,et al.  CTGR-Span: Efficient mining of cross-timepoint gene regulation sequential patterns from microarray datasets , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[26]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[27]  Gary D Bader,et al.  The Genetic Landscape of a Cell , 2010, Science.

[28]  R. Davis,et al.  Biomarkers of human cutaneous squamous cell carcinoma from tissues and cell lines identified by DNA microarrays and qRT-PCR. , 2003, Biochemical and biophysical research communications.

[29]  M. Campo,et al.  HPV and oesophageal carcinoma. , 2006 .

[30]  Nan Hu,et al.  Genome wide analysis of DNA copy number neutral loss of heterozygosity (CNNLOH) and its relation to gene expression in esophageal squamous cell carcinoma , 2010, BMC Genomics.

[31]  Nan Hu,et al.  Global Gene Expression Profiling and Validation in Esophageal Squamous Cell Carcinoma and Its Association with Clinical Phenotypes , 2011, Clinical Cancer Research.

[32]  Gregory A. Bohach,et al.  5 – Pathogenesis of disease , 2004 .

[33]  Daniel J. Levitin,et al.  Patterns of pain: Meta-analysis of microarray studies of pain , 2011, PAIN®.

[34]  Xue-wen Chen,et al.  A Markov blanket-based method for detecting causal SNPs in GWAS , 2010, BMC Bioinformatics.

[35]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[36]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[37]  Ivan Rusyn,et al.  Gene expression in nontumoral liver tissue and recurrence-free survival in hepatitis C virus-positive hepatocellular carcinoma , 2010, Molecular Cancer.

[38]  Lars Schmidt-Thieme,et al.  Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007 , 2008, GfKl.

[39]  Vincent S. Tseng,et al.  Discovering relational-based association rules with multiple minimum supports on microarray datasets , 2011, Bioinform..

[40]  Jia Li,et al.  Biomarker detection in the integration of multiple multi-class genomic studies , 2010, Bioinform..

[41]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[42]  Stanley Fields,et al.  Lethal combinations , 2003, Nature Genetics.

[43]  Guillaume Vogt,et al.  The human gene connectome as a map of short cuts for morbid allele discovery , 2013, Proceedings of the National Academy of Sciences.

[44]  J. Luketich,et al.  Oesophageal carcinoma , 2013, The Lancet.

[45]  Yan Fang,et al.  Expression profiles of early esophageal squamous cell carcinoma by cDNA microarray. , 2009, Cancer genetics and cytogenetics.

[46]  Hao Helen Zhang,et al.  Weighted Distance Weighted Discrimination and Its Asymptotic Properties , 2010, Journal of the American Statistical Association.

[47]  Michiie Sakamoto,et al.  Candidate Molecular Markers for Histological Diagnosis of Early Hepatocellular Carcinoma , 2008, Intervirology.

[48]  Nan Hu,et al.  Identification of unique expression signatures and therapeutic targets in esophageal squamous cell carcinoma , 2012, BMC Research Notes.

[49]  B. Mínguez,et al.  Diagnostic and Prognostic Molecular Markers in Hepatocellular Carcinoma , 2011, Disease markers.