BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis

Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to the value given to its input parameters and the discretization procedure used in the preprocessing step, also when noise is present, classical association rules miners discover multiple small fragments of the true bicluster, but miss the true bicluster itself. This paper formally presents a generalized noise tolerant bicluster model, termed as μBicluster. An iterative algorithm termed as BIDENS based on the proposed model is introduced that can discover a set of k possibly overlapping biclusters simultaneously. Our model uses a more flexible method to partition the dimensions to preserve meaningful and significant biclusters. The proposed algorithm allows discovering biclusters that hard to be discovered by BIMODULE. Experimental study on yeast, human gene expression data and several artificial datasets shows that our algorithm offers substantial improvements over several previously proposed biclustering algorithms. Keywords—Machine learning, biclustering, bi-dimensional clustering, gene expression analysis, data mining.

[1]  Ahmed H. Tewfik,et al.  DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach , 2006, EURASIP J. Adv. Signal Process..

[2]  Mohamed A. Ismail,et al.  αCORR: a novel algorithm for clustering gene expression data , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[3]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[4]  Jinyan Li,et al.  Distance Based Subspace Clustering with Flexible Dimension Partitioning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[8]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[9]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[10]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[11]  Wei Wang,et al.  OP-cluster: clustering by tendency in high dimensional space , 2003, Third IEEE International Conference on Data Mining.

[12]  Aidong Zhang,et al.  Interrelated two-way clustering: an unsupervised approach for gene expression data analysis , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[13]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[14]  A. S. Thoke,et al.  International Journal of Electrical and Computer Engineering 3:16 2008 Fault Classification of Double Circuit Transmission Line Using Artificial Neural Network , 2022 .

[15]  Anthony K. H. Tung,et al.  Fault-Tolerant Frequent Pattern Mining: Problems and Challenges , 2001, DMKD.

[16]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[17]  M. Wand Data-Based Choice of Histogram Bin Width , 1997 .