Sisa: Seeded Iterative Signature Algorithm for Biclustering Gene Expression Data

ABSTRACT One approach to reduce the complexity of the task in the analysis of large scale genome-wide expression is to group the genes showing similar expression patterns into what are called transcription modules (TM). A TM is defined as a set of genes and a set of conditions under which these genes are most tightly co-expressed. There exist many algorithms for the analysis of gene expression data. Most of them compute non-overlapping TMs whereas a gene may be responsible for more than one cellular activity and hence must be included in more than one TMs. Existing algorithms like Signature Algorithm (SA) and Iterative Signature Algorithm (ISA) compute overlapping TMs. SA requires prior biological information of co-regulated genes which it takes as an input whereas ISA starts with a totally random input gene seed. Generating good seeds for ISA is a challenging problem. In this paper, we present an elegant way to generate an intelligent gene seed from the expression data itself. This eliminates the need to have prior information about co-regulated genes. Experimental results were obtained for synthetic data as well as for the expression data for the yeast Saccharomyces cerevisiae. TMs obtained for the yeast data were found to be biologically and statistically significant using Gene Ontology database.

[1]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD 2000.

[2]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[3]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[5]  Michael K. Ng,et al.  HARP: a practical projected clustering algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jian Pei,et al.  Clustering by Pattern Similarity , 2008, Journal of Computer Science and Technology.

[7]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[9]  Krista Rizman Zalik,et al.  Biclustering of gene expression data , 2005 .

[10]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .