BARTMAP: A viable structure for biclustering

Clustering has been used extensively in the analysis of high-throughput messenger RNA (mRNA) expression profiling with microarrays. Furthermore, clustering has proven elemental in microRNA expression profiling, which demonstrates enormous promise in the areas of cancer diagnosis and treatment, gene function identification, therapy development and drug testing, and genetic regulatory network inference. However, such a practice is inherently limited due to the existence of many uncorrelated genes with respect to sample or condition clustering, or many unrelated samples or conditions with respect to gene clustering. Biclustering offers a solution to such problems by performing simultaneous clustering on both dimensions, or automatically integrating feature selection to clustering without any prior information, so that the relations of clusters of genes (generally, features) and clusters of samples or conditions (data objects) are established. However, the NP-complete computational complexity raises a great challenge to computational methods for identifying such local relations. Here, we propose and demonstrate that a neural-based classifier, ARTMAP, can be modified to perform biclustering in an efficient way, leading to a biclustering algorithm called Biclustering ARTMAP (BARTMAP). Experimental results on multiple human cancer data sets show that BARTMAP can achieve clustering structures with higher qualities than those achieved with other commonly used biclustering or clustering algorithms, and with fast run times.

[1]  Eytan Domany,et al.  Coupled Two-way Clustering Analysis of Breast Cancer and Colon Cancer Gene Expression Data , 2002, Bioinform..

[2]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[3]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[4]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[5]  Christodoulos A. Floudas,et al.  Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies , 2008, BMC Bioinformatics.

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[8]  Wojtek J. Krzanowski,et al.  Biclustering models for structured microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Ahmed H. Tewfik,et al.  Early detection of ovarian cancer using group biomarkers , 2008, Molecular Cancer Therapeutics.

[10]  Roded Sharan,et al.  Algorithmic approaches to clustering gene expression data , 2001 .

[11]  Georgios C. Anagnostopoulos,et al.  Ellipsoid ART and ARTMAP for incremental unsupervised and supervised learning , 2001, SPIE Defense + Commercial Sensing.

[12]  Mohammed Yeasin,et al.  Fuzzy-Adaptive-Subspace-Iteration-Based Two-Way Clustering of Microarray Data , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[14]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[18]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[19]  Zhoujun Li,et al.  Biclustering of microarray data with MOSPO based on crowding distance , 2009, BMC Bioinformatics.

[20]  Peter N. Robinson,et al.  Binary State Pattern Clustering: A Digital Paradigm for Class and Biomarker Discovery in Gene Microarray Studies of Cancer , 2006, J. Comput. Biol..

[21]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[22]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[23]  Aidong Zhang,et al.  Interrelated Two-Way Clustering and its Application on Gene Expression Data , 2005, Int. J. Artif. Intell. Tools.

[24]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[25]  René Peeters,et al.  The maximum edge biclique problem is NP-complete , 2003, Discret. Appl. Math..

[26]  Gail A. Carpenter,et al.  Default ARTMAP 2 , 2007, 2007 International Joint Conference on Neural Networks.

[27]  Ujjwal Maulik,et al.  SFSSClass: an integrated approach for miRNA based tumor classification , 2010, BMC Bioinformatics.

[28]  Lakhmi C. Jain,et al.  Innovations in Fuzzy Clustering - Theory and Applications , 2006, Studies in Fuzziness and Soft Computing.

[29]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[30]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[31]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[32]  Michael Q. Zhang,et al.  Current Topics in Computational Molecular Biology , 2002 .

[33]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[34]  Donald C. Wunsch,et al.  Million city traveling salesman problem solution by divide and conquer clustering with adaptive resonance neural networks , 2003, Neural Networks.

[35]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[36]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..

[37]  Rui Xu,et al.  MicroRNA expression profile based cancer classification using Default ARTMAP , 2009, Neural Networks.

[38]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[39]  Kathleen Marchal,et al.  Functional bioinformatics of microarray data: from expression to regulation , 2002, Proc. IEEE.

[40]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[41]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[42]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[43]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[44]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[45]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[46]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[47]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[48]  Jun S Liu,et al.  Bayesian biclustering of gene expression data , 2008, BMC Genomics.

[49]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[50]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[51]  D. Duffy,et al.  A permutation-based algorithm for block clustering , 1991 .

[52]  G. Carpenter Default ARTMAP , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[53]  Stephen Grossberg,et al.  Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions , 1976, Biological Cybernetics.

[54]  D. Wunsch,et al.  Multiclass Cancer Classification Using Semisupervised Ellipsoid ARTMAP and Particle Swarm Optimization with Gene Expression Data , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[55]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[56]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[58]  Witold Pedrycz,et al.  Advances in Fuzzy Clustering and its Applications , 2007 .