Finding Correlated Biclusters from Gene Expression Data

Extracting biologically relevant information from DNA microarrays is a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been proposed for the analysis of gene expression data, but when analyzing the large and heterogeneous collections of gene expression data, conventional clustering algorithms often cannot produce a satisfactory solution. Biclustering algorithm has been presented as an alternative approach to standard clustering techniques to identify local structures from gene expression data set. These patterns may provide clues about the main biological processes associated with different physiological states. In this paper, different from existing bicluster patterns, we first introduce a more general pattern: correlated bicluster, which has intuitive biological interpretation. Then, we propose a novel transform technique based on singular value decomposition so that identifying correlated-bicluster problem from gene expression matrix is transformed into two global clustering problems. The Mixed-Clustering algorithm and the Lift algorithm are devised to efficiently produce δ-corBiclusters. The biclusters obtained using our method from gene expression data sets of multiple human organs and the yeast Saccharomyces cerevisiae demonstrate clear biological meanings.

[1]  M. Gerstein,et al.  Genomic analysis of gene expression relationships in transcriptional regulatory networks. , 2003, Trends in genetics : TIG.

[2]  J. Khan,et al.  Database of mRNA gene expression profiles of multiple human organs. , 2005, Genome research.

[3]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[4]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[5]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[6]  Georg Peters,et al.  Some refinements of rough k-means clustering , 2006, Pattern Recognit..

[7]  Li Liu,et al.  Robust singular value decomposition analysis of microarray data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[9]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[10]  Geoffrey C. Fox,et al.  A deterministic annealing approach to clustering , 1990, Pattern Recognit. Lett..

[11]  Francesco Masulli,et al.  Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data , 2006, Pattern Recognit..

[12]  Chunru Wan,et al.  Unsupervised gene selection via spectral biclustering , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[13]  Ahmed H. Tewfik,et al.  Parallel identification of gene biclusters with coherent evolutions , 2006, IEEE Transactions on Signal Processing.

[14]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[15]  Ge Yu,et al.  Maximal Subspace Coregulated Gene Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[16]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[17]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[18]  Neelima Gupta,et al.  Mib: Using Mutual Information for Biclustering High Dimensional Data , 2008, IADIS European Conf. Data Mining.

[19]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem. , 2003 .

[20]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[21]  Hong Yan,et al.  Discovering biclusters in gene expression data based on high-dimensional linear geometries , 2008, BMC Bioinformatics.

[22]  Arlindo L. Oliveira,et al.  A Linear Time Biclustering Algorithm for Time Series Gene Expression Data , 2005, WABI.

[23]  Sung-Bae Cho,et al.  Fuzzy Bayesian validation for cluster analysis of yeast cell-cycle data , 2006, Pattern Recognit..

[24]  Andrew K. C. Wong,et al.  Simultaneous Pattern and Data Clustering for Pattern Cluster Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[25]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Russ B. Altman Combining Simulation and Machine Learning to Recognize Function in 4D , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[27]  Jesús S. Aguilar-Ruiz,et al.  Shifting and scaling patterns from gene expression data , 2005, Bioinform..

[28]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[29]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[30]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[31]  Peter N. Robinson,et al.  Binary State Pattern Clustering: A Digital Paradigm for Class and Biomarker Discovery in Gene Microarray Studies of Cancer , 2006, J. Comput. Biol..

[32]  Ozgur Ozturk,et al.  A Multi-metric Similarity Based Analysis of Microarray Data , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[33]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[34]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[35]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[36]  Hong Yan,et al.  Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[37]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[38]  Wojtek J. Krzanowski,et al.  Biclustering models for structured microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  Georgios B. Giannakis,et al.  Identifying differentially expressed genes in microarray experiments with model-based variance estimation , 2006, IEEE Transactions on Signal Processing.

[40]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[41]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[42]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Ron Shamir,et al.  EXPANDER – an integrative program suite for microarray data analysis , 2005, BMC Bioinformatics.

[44]  Francesco Masulli,et al.  Soft transition from probabilistic to possibilistic fuzzy clustering , 2006, IEEE Transactions on Fuzzy Systems.

[45]  Guoren Wang,et al.  Mining Time-Delayed Coherent Patterns in Time Series Gene Expression Data , 2006, ADMA.

[46]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[47]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[48]  Krista Rizman Zalik,et al.  Biclustering of gene expression data , 2005 .

[49]  Chengyu Liu,et al.  Biclustering of gene expression data by non-smooth non-negative matrix factorization , 2010 .

[50]  Gene H Golub,et al.  Singular value decomposition of genome-scale mRNA lengths distribution reveals asymmetry in RNA gel electrophoresis band broadening , 2006, Proceedings of the National Academy of Sciences.

[51]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[52]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[53]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[54]  Jieping Ye,et al.  Kernel Uncorrelated and Regularized Discriminant Analysis: A Theoretical and Computational Study , 2008, IEEE Transactions on Knowledge and Data Engineering.

[55]  Hong Yan,et al.  Biclustering of Microarray Data Based on Singular Value Decomposition , 2007, PAKDD Workshops.

[56]  Ya Zhang,et al.  A time-series biclustering algorithm for revealing co-regulated genes , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[57]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Amir Hussain,et al.  A new biclustering technique based on crossing minimization , 2006, Neurocomputing.