Shifting-and-Scaling Correlation Based Biclustering Algorithm

The existence of various types of correlations among the expressions of a group of biologically significant genes poses challenges in developing effective methods of gene expression data analysis. The initial focus of computational biologists was to work with only absolute and shifting correlations. However, researchers have found that the ability to handle shifting-and-scaling correlation enables them to extract more biologically relevant and interesting patterns from gene microarray data. In this paper, we introduce an effective shifting-and-scaling correlation measure named Shifting and Scaling Similarity (SSSim), which can detect highly correlated gene pairs in any gene expression data. We also introduce a technique named Intensive Correlation Search (ICS) biclustering algorithm, which uses SSSim to extract biologically significant biclusters from a gene expression data set. The technique performs satisfactorily with a number of benchmarked gene expression data sets when evaluated in terms of functional categories in Gene Ontology database.

[1]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[2]  Jesús S. Aguilar-Ruiz,et al.  Shifting and scaling patterns from gene expression data , 2005, Bioinform..

[3]  Jugal K. Kalita,et al.  GERC: Tree Based Clustering for Gene Expression Data , 2011, 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering.

[4]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Jesús S. Aguilar-Ruiz,et al.  Measuring the Quality of Shifting and Scaling Patterns in Biclusters , 2010, PRIB.

[6]  Xin Xu,et al.  Detecting and Visualizing Profile Correlation in Subspace , 2011, 2011 10th IEEE/ACIS International Conference on Computer and Information Science.

[7]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[8]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[9]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[10]  Federico Divina,et al.  An effective measure for assessing the quality of biclusters , 2012, Comput. Biol. Medicine.

[11]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Pascal Nsoh,et al.  Large-scale temporal gene expression mapping of central nervous system development , 2007 .

[13]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[15]  Ujjwal Maulik,et al.  A Novel Coherence Measure for Discovering Scaling Biclusters from Gene Expression Data , 2009, J. Bioinform. Comput. Biol..

[16]  Anthony K. H. Tung,et al.  Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Li Teng,et al.  Finding dominant sets in microarray data. , 2005, Frontiers in bioscience : a journal and virtual library.

[18]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[19]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[20]  Hong Yan,et al.  Finding Correlated Biclusters from Gene Expression Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[21]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[22]  Sanghamitra Bandyopadhyay,et al.  A Biologically Inspired Measure for Coexpression Analysis , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Sushmita Mitra,et al.  Evolutionary Biclustering with Correlation for Gene Interaction Networks , 2007, PReMI.

[24]  Lusheng Wang,et al.  Computing the maximum similarity bi-clusters of gene expression data , 2007, Bioinform..

[25]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[26]  Jesús S. Aguilar-Ruiz,et al.  Configurable pattern-based evolutionary biclustering of gene expression data , 2012, Algorithms for Molecular Biology.

[27]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[28]  Li Teng,et al.  Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data , 2008, J. Signal Process. Syst..

[29]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[30]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[31]  Aedín C. Culhane,et al.  iBBiG: iterative binary bi-clustering of gene sets , 2012, Bioinform..

[32]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[33]  Francis D. Gibbons,et al.  Judging the quality of gene expression-based clustering methods using gene annotation. , 2002, Genome research.

[34]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[35]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[36]  Jingfeng Guo,et al.  Mining Multi-Patterns in Pattern-Based Clustering , 2012 .

[37]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..