BiRange:An Efficient Framework for Biclustering of Gene Expression Data Using Range Bipartite Graph

Biclustering is a vital data mining tool which is commonly emp loyed on microarray data sets for analysis task in bioinformat ics research and medical applications. There has been extensive research on biclustering of gene expression data arising fro m microarray experiment. This technique is an important analysis tool in gene exp ression measurement, when some genes have mu ltip le functions and experimental conditions are diverse. In this paper, we introduce a new framework for biclustering of gene expression data. The basis of this framework is the construction of a range bipartite graph for the rep- resentation of 2-dimensional gene expression data. We have constructed this range bipartite graph by partitioning the set of experimental conditions into t wo disjo int sets. The key benefit of this rep resentation is that, it leads to a co mpact represen- tation of all similar value ranges between experimental conditions. Based on this problem formu lation, an efficient algorith m is proposed that searches for constrained maximal cliques in this range bipartite graph, in order to extract a set of biclusters. Our technique is scalable to pract ical gene expression data and can produce different types of biclusters amid noise. The experimental evaluation of this technique also reveals its accuracy and effectiveness with respect to noise handling and execution time in co mparison to other similar techniques.

[1]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[3]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[4]  Eric R. Ziegel,et al.  Probability and Statistics for Engineering and the Sciences , 2004, Technometrics.

[5]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[6]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[7]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Waseem Ahmad,et al.  cHawk : An Efficient Biclustering Algorithm based on Bipartite Graph Crossing Minimization , 2007 .

[9]  Amir Hussain,et al.  A new biclustering technique based on crossing minimization , 2006, Neurocomputing.

[10]  ThieleLothar,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006 .

[11]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[12]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[13]  Ying Yang,et al.  A comparative study of discretization methods for naive-Bayes classifiers , 2002 .

[14]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[15]  Lusheng Wang,et al.  Computing the maximum similarity bi-clusters of gene expression data , 2007, Bioinform..

[16]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[17]  Ahmed H. Tewfik,et al.  Robust biclustering algorithm (ROBA) for DNA microarray data analysis , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[18]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[19]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[20]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[21]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.