Meta-clustering of gene expression data and literature-based information

The current tendency in the life sciences to spawn ever growing amounts of high-throughput assays has led to a situation where the interpretation of data and the formulation of hypotheses lag the pace at which information is produced. Although the first generation of statistical algorithms scrutinizing single, large-scale data sets found their way into the biological community, the great challenge to connect their results to existing knowledge still remains. Despite the fairly large number of biological databases that is currently available, a lot of relevant information is found in free-text format (such as textual annotations, scientific abstracts and full publications). In this paper we explore how an integrated analysis of expression data and literature-extracted information can reveal biologically meaningful clusters not identified when using microarray information alone. The joint analysis is validated in terms of transcriptional regulation.

[1]  Francis D. Gibbons,et al.  Judging the quality of gene expression-based clustering methods using gene annotation. , 2002, Genome research.

[2]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[3]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[4]  Russ B. Altman,et al.  A literature-based method for assessing the functional coherence of a gene group , 2003, Bioinform..

[5]  Russ B. Altman,et al.  Inclusion of Textual Documentation in the Analysis of Multidimensional Data Sets: Application to Gene Expression Data , 2003, Machine Learning.

[6]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[7]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[8]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[9]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[10]  A. Valencia,et al.  Mining functional information associated with expression arrays , 2001, Functional & Integrative Genomics.

[11]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[12]  Esko Ukkonen,et al.  Mining for Putative Regulatory Elements in the Yeast Genome Using Gene Expression Data , 2000, ISMB.

[13]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[14]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[15]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[16]  B. De Moor,et al.  TXTGate: profiling gene groups with text-based information , 2004, Genome Biology.

[17]  Jung-Hsien Chiang,et al.  MeKE: Discovering the Functions of Gene Products from Biomedical Literature Via Sentence Alignment , 2003, Bioinform..

[18]  Bart De Moor,et al.  Text-Based Gene Profiling with Domain-Specific Views , 2003, SWDB.

[19]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[20]  Thomas Lengauer,et al.  Analysis of Gene Expression Data with Pathway Scores , 2000, ISMB.

[21]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[22]  Mark J. van der Laan,et al.  A Method to Identify Significant Clusters in Gene Expression Data , 2002 .

[23]  Jeffrey T. Chang,et al.  The computational analysis of scientific literature to define and recognize gene expression clusters. , 2003, Nucleic acids research.

[24]  Jeffrey B. Colombe,et al.  Finding relevant references to genes and proteins in Medline using a Bayesian approach , 2002, Bioinform..

[25]  Kathleen Marchal,et al.  INCLUSive: a web portal and service registry for microarray and regulatory sequence analysis , 2003, Nucleic Acids Res..

[27]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2002 update , 2002, Nucleic Acids Res..

[28]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[29]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001 .

[30]  D. Chaussabel,et al.  Mining microarray expression data by literature profiling , 2002, Genome Biology.

[31]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[32]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Hagit Shatkay,et al.  Information retrieval meets gene analysis , 2002 .

[34]  William Stafford Noble,et al.  Exploring Gene Expression Data with Class Scores , 2001, Pacific Symposium on Biocomputing.

[35]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[36]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[37]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[38]  Jeffrey T. Chang,et al.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. , 2002, Genome research.

[39]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[40]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.

[41]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[42]  Javed Mostafa,et al.  Detecting Gene Relations from MEDLINE Abstracts , 2000, Pacific Symposium on Biocomputing.

[43]  Yoshihiro Yamanishi,et al.  Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis , 2003, ISMB.

[44]  David R. Bickel,et al.  Robust Cluster Analysis of Microarray Gene Expression Data with the Number of Clusters Determined Biologically , 2003, Bioinform..

[45]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[46]  B. De Moor,et al.  Comparison and meta-analysis of microarray data: from the bench to the computer desk. , 2003, Trends in genetics : TIG.

[47]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[48]  M. Vidal A Biological Atlas of Functional Maps , 2001, Cell.

[49]  Bart De Moor,et al.  Evaluation of the Vector Space Representation in Text-Based Gene Clustering , 2002, Pacific Symposium on Biocomputing.