Biclustering of Expression Microarray Data Using Affinity Propagation

Biclustering, namely simultaneous clustering of genes and samples, represents a challenging and important research line in the expression microarray data analysis. In this paper, we investigate the use of Affinity Propagation, a popular clustering method, to perform biclustering. Specifically, we cast Affinity Propagation into the Couple Two Way Clustering scheme, which allows to use a clustering technique to perform biclustering. We extend the CTWC approach, adapting it to Affinity Propagation, by introducing a stability criterion and by devising an approach to automatically assemble couples of stable clusters into biclusters. Empirical results, obtained in a synthetic benchmark for biclustering, show that our approach is extremely competitive with respect to the state of the art, achieving an accuracy of 91% in the worst case performance and 100% accuracy for all tested noise levels in the best case.

[1]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[2]  Colin Campbell,et al.  The latent process decomposition of cDNA microarray data sets , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Brendan J. Frey,et al.  A Binary Variable Model for Affinity Propagation , 2009, Neural Computation.

[4]  Pablo M. Granitto,et al.  Clustering gene expression data with a penalized graph-based metric , 2011, BMC Bioinformatics.

[5]  Horst Bischof,et al.  Robust DNA microarray image analysis , 2003, Machine Vision and Applications.

[6]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[7]  Alessandro Perina,et al.  Expression microarray classification using topic models , 2010, SAC '10.

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[10]  Alessandro Perina,et al.  Biologically-aware Latent Dirichlet Allocation (BaLDA) for the Classification of Expression Microarray , 2010, PRIB.

[11]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[12]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[13]  Michele Leone,et al.  Clustering by Soft-constraint Affinity Propagation: Applications to Gene-expression Data , 2022 .

[14]  F. Valafar Pattern Recognition Techniques in Microarray Data Analysis , 2002, Annals of the New York Academy of Sciences.

[15]  Manuele Bicego,et al.  Biclustering of Expression Microarray Data with Topic Models , 2010, 2010 20th International Conference on Pattern Recognition.

[16]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[19]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[20]  Yueting Zhuang,et al.  Clustering by evidence accumulation on affinity propagation , 2008, 2008 19th International Conference on Pattern Recognition.

[21]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[22]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[23]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[24]  Sach Mukherjee,et al.  Temporal clustering by affinity propagation reveals transcriptional modules in Arabidopsis thaliana , 2010, Bioinform..

[25]  Jia-Shung Wang,et al.  AP-Based Consensus Clustering for Gene Expression Time Series , 2010, 2010 20th International Conference on Pattern Recognition.

[26]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  F. Valafar Pattern Recognition Techniques in Microarray Data Analysis : A Survey , 2002 .