Biclustering of DNA Microarray Data: Theory, Evaluation, and Applications

In this chapter, different methods and applications of biclustering algorithms to DNA microarray data analysis that have been developed in recent years are discussed and compared. Identification of biological significant clusters of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high throughput technologies. Various computational and evaluation methods based on diverse principles were introduced to identify new similarities among genes. Mathematical aspects of the models are highlighted, and applications to solve biological problems are discussed. Panayiotis V. Benos University of Pittsburgh, USA

[1]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[3]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[4]  J. Lieb,et al.  ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. , 2004, Genomics.

[5]  Mohamed A. Ismail,et al.  BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis , 2009 .

[6]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[7]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[8]  I. Androulakis,et al.  Analysis of time-series gene expression data: methods, challenges, and opportunities. , 2007, Annual review of biomedical engineering.

[9]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[10]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[11]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[12]  Martin Schader,et al.  A New Algorithm for Two-Mode Clustering , 1996 .

[13]  Ahmed H. Tewfik,et al.  Biological evaluation of biclustering algorithms using Gene Ontology and chIP-chip data , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Jun S Liu,et al.  Bayesian biclustering of gene expression data , 2008, BMC Genomics.

[15]  Sven Bergmann,et al.  Modular analysis of gene expression data with R , 2010, Bioinform..

[16]  Edgar Wingender,et al.  PRODORIC: prokaryotic database of gene regulation , 2003, Nucleic Acids Res..

[17]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[18]  H. Bussey,et al.  Exploring genetic interactions and networks with yeast , 2007, Nature Reviews Genetics.

[19]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[20]  B. S. Baker,et al.  Gene Expression During the Life Cycle of Drosophila melanogaster , 2002, Science.

[21]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[22]  S. Kaski,et al.  Bayesian biclustering with the plaid model , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[23]  Wojtek J. Krzanowski,et al.  Biclustering models for structured microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[25]  Obi L. Griffith,et al.  KiWi: A Scalable Subspace Clustering Algorithm for Gene Expression Analysis , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[26]  Timothy S Gardner,et al.  Reverse-engineering transcription control networks. , 2005, Physics of life reviews.

[27]  Pooja Jain,et al.  The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae , 2005, Nucleic Acids Res..

[28]  S. Falkow,et al.  Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Ahmed H. Tewfik,et al.  Parallel identification of gene biclusters with coherent evolutions , 2006, IEEE Transactions on Signal Processing.

[30]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[31]  Hong Yan,et al.  A neural-network approach for biclustering of gene expression data based on the plaid model , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[32]  E. Wit Design and Analysis of DNA Microarray Investigations , 2004, Human Genomics.

[33]  Olga G. Troyanskaya,et al.  Detailing regulatory networks through large scale data integration , 2009, Bioinform..

[34]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[35]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[36]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[37]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[38]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[39]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[40]  Ron Shamir,et al.  PIVOT : Protein Interactions VisualizatiOn Tool , 2003 .

[41]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[42]  Purvesh Khatri,et al.  Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate , 2003, Nucleic Acids Res..

[43]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[44]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[45]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[46]  Thomas Lengauer,et al.  Analysis of Gene Expression Data with Pathway Scores , 2000, ISMB.

[47]  Amir Hussain,et al.  A new biclustering technique based on crossing minimization , 2006, Neurocomputing.

[48]  Friedrich Leisch,et al.  A toolbox for bicluster analysis in R , 2008 .

[49]  N. H. Shah,et al.  CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology , 2004, Bioinform..

[50]  M. Buchanan,et al.  Transcriptional regulatory networks , 2010 .

[51]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[52]  Ned S. Wingreen,et al.  Finding regulatory modules through large-scale gene-expression data analysis , 2003, Bioinform..

[53]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[54]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[55]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[56]  Ahmed H. Tewfik,et al.  Early detection of ovarian cancer using group biomarkers , 2008, Molecular Cancer Therapeutics.

[57]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[58]  Lusheng Wang,et al.  Computing the maximum similarity bi-clusters of gene expression data , 2007, Bioinform..

[59]  Lei Zhang,et al.  Gene Expression Data Analysis Using a Novel Approach to Biclustering Combining Discrete and Continuous Data , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[60]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[61]  Ron Shamir,et al.  Allegro: Analyzing expression and sequence in concert to discover regulatory programs , 2009, Nucleic acids research.

[62]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[63]  P. Pardalos,et al.  Biclustering EEG data from epileptic patients treated with vagus nerve stimulation , 2007 .

[64]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[65]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[66]  Anil K. Kesarwani,et al.  Genome Informatics , 2019, Encyclopedia of Bioinformatics and Computational Biology.

[67]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[68]  Wan-Chi Siu,et al.  BiVisu: software tool for bicluster detection and visualization , 2007, Bioinform..

[69]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[70]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[71]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[72]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[73]  Midori A. Harris,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm112 Databases and ontologies OBO-Edit—an ontology editor for biologists , 2007 .

[74]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[75]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[76]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[77]  Joaquín Dopazo,et al.  BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments , 2006, Nucleic Acids Res..

[78]  G. ErikaJohanaSalazar,et al.  A Cluster Validity Index for Comparing Non-hierarchical Clustering Methods , 2002 .

[79]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[80]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[81]  Andreas Zell,et al.  EDISA: extracting biclusters from multiple time-series of gene expression profiles , 2007, BMC Bioinformatics.

[82]  Neelima Gupta,et al.  Sisa: Seeded Iterative Signature Algorithm for Biclustering Gene Expression Data , 2008, IADIS European Conf. Data Mining.

[83]  Sieu Phan,et al.  GOAL: A software tool for assessing biological significance of genes groups , 2009, BMC Bioinformatics.

[84]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[85]  Li Teng,et al.  Order Preserving Clustering by Finding Frequent Orders in Gene Expression Data , 2007, PRIB.

[86]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[87]  Jinze Liu,et al.  Biclustering in gene expression data by tendency , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[88]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[89]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[90]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[91]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[92]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[93]  W. Wong,et al.  GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. , 2004, Applied bioinformatics.

[94]  Zelmina Lubovac,et al.  Biological and statistical evaluation of clusterings of gene expression profiles , 2001 .

[95]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[96]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[97]  Roberto Therón,et al.  A visual analytics approach for understanding biclustering results from microarray data , 2008, BMC Bioinformatics.

[98]  Panayiotis V. Benos,et al.  Extracting biologically significant patterns from short time series gene expression data , 2009, BMC Bioinformatics.

[99]  Ahmed H. Tewfik,et al.  DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach , 2006, EURASIP J. Adv. Signal Process..

[100]  Grier P. Page,et al.  Bioinformatic Tools for Inferring Functional Information from Plant Microarray Data II: Analysis Beyond Single Gene , 2008, International journal of plant genomics.

[101]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[102]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[103]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[104]  Peer Bork,et al.  KEGG Atlas mapping for global analysis of metabolic pathways , 2008, Nucleic Acids Res..