Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures

BackgroundDNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting information on this genomic scale, and a first step towards the rapid and comprehensive interpretation of this data is gene clustering with respect to the expression patterns. Classifying genes into clusters can lead to interesting biological insights. In this study, we describe an iterative clustering approach to uncover biologically coherent structures from DNA microarray data based on a novel clustering algorithm EP_GOS_Clust.ResultsWe apply our proposed iterative algorithm to three sets of experimental DNA microarray data from experiments with the yeast Saccharomyces cerevisiae and show that the proposed iterative approach improves biological coherence. Comparison with other clustering techniques suggests that our iterative algorithm provides superior performance with regard to biological coherence. An important consequence of our approach is that an increasing proportion of genes find membership in clusters of high biological coherence and that the average cluster specificity improves.ConclusionThe results from these clustering experiments provide a robust basis for extracting motifs and trans-acting factors that determine particular patterns of expression. In addition, the biological coherence of the clusters is iteratively assessed independently of the clustering. Thus, this method will not be severely impacted by functional annotations that are missing, inaccurate, or sparse.

[1]  M. Johnston,et al.  Glucose as a hormone: receptor-mediated glucose sensing in the yeast Saccharomyces cerevisiae. , 2005, Biochemical Society transactions.

[2]  David Botstein,et al.  The Stanford Microarray Database: data access and quality assessment tools , 2003, Nucleic Acids Res..

[3]  Saeed Tavazoie,et al.  Ras and Gpa2 Mediate One Branch of a Redundant Glucose Signaling Pathway in Yeast , 2004, PLoS biology.

[4]  Lisa Schneper,et al.  Sense and sensibility: nutritional response and signal integration in yeast. , 2004, Current opinion in microbiology.

[5]  Christodoulos A. Floudas,et al.  APROS: Algorithmic Development Methodology for Discrete-Continuous Optimization Problems , 1989, Oper. Res..

[6]  Michael Spann,et al.  A new approach to clustering , 1990, Pattern Recognit..

[7]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[8]  J. D. de Winde,et al.  The Sch9 protein kinase in the yeast Saccharomyces cerevisiae controls cAPK activity and is required for nitrogen activation of the fermentable-growth-medium-induced (FGM) pathway. , 1997, Microbiology.

[9]  Barrett C. Foat,et al.  Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Christodoulos A. Floudas,et al.  Evaluation of Normalization and Pre-Clustering Issues in a Novel Clustering Approach: Global Optimum Search with Enhanced Positioning , 2007, J. Bioinform. Comput. Biol..

[11]  Hanif D. Sherali,et al.  A Global Optimization RLT-based Approach for Solving the Fuzzy Clustering Problem , 2005, J. Glob. Optim..

[12]  Ying Wang,et al.  Theoretical and computational studies of the glucose signaling pathways in yeast using global gene expression data , 2003, Biotechnology and bioengineering.

[13]  V. Arango,et al.  Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex , 2004, Neurochemical Research.

[14]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[16]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[17]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[18]  Michael A. Siani-Rose,et al.  A Knowledge-Based Clustering Algorithm Driven by Gene Ontology , 2004, Journal of biopharmaceutical statistics.

[19]  Shoshana J. Wodak,et al.  CYGD: the Comprehensive Yeast Genome Database , 2004, Nucleic Acids Res..

[20]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[21]  Christodoulos A. Floudas,et al.  A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning , 2007, J. Glob. Optim..

[22]  G. Santangelo,et al.  Glucose Signaling in Saccharomyces cerevisiae , 2006, Microbiology and Molecular Biology Reviews.

[23]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[24]  C. Floudas Nonlinear and Mixed-Integer Optimization: Fundamentals and Applications , 1995 .

[25]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[26]  W. Bialek,et al.  Information-based clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  J. Broach,et al.  The function of ras genes in Saccharomyces cerevisiae. , 1990, Advances in cancer research.

[28]  Hanif D. Sherali,et al.  Linearization Strategies for a Class of Zero-One Mixed Integer Programming Problems , 1990, Oper. Res..

[29]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[30]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[31]  Matthew A. Hibbs,et al.  Finding function: evaluation methods for functional genomic data , 2006, BMC Genomics.

[32]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[33]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[34]  Wei Pan,et al.  Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data , 2006, Bioinform..

[35]  David Kendrick,et al.  GAMS, a user's guide , 1988, SGNM.

[36]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[37]  Rachel B. Brem,et al.  Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors , 2003, Nature Genetics.

[38]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[39]  Umeshwar Dayal,et al.  K-Harmonic Means - A Data Clustering Algorithm , 1999 .

[40]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[41]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[42]  Hanif D. Sherali,et al.  A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem , 2005, J. Glob. Optim..

[43]  Hao Wu,et al.  R/qtl: QTL Mapping in Experimental Crosses , 2003, Bioinform..

[44]  Wei Pan,et al.  Bioinformatics Original Paper Incorporating Gene Functions as Priors in Model-based Clustering of Microarray Gene Expression Data , 2022 .

[45]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[46]  M. Carlson,et al.  Glucose repression in yeast. , 1999, Current opinion in microbiology.

[47]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[48]  Simon Kasif,et al.  Hierarchical tree snipping: clustering guided by prior knowledge , 2007, Bioinform..

[49]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[50]  C. Floudas,et al.  Global optimum search for nonconvex NLP and MINLP problems , 1989 .