Multiple Co-clusterings

The goal of multiple clusterings is to discover multiple independent ways of organizing a dataset into clusters. Current approaches to this problem just focus on one-way clustering. In many real-world applications, though, it's meaningful and desirable to explore alternative two-way clustering (or co-clusterings), where both samples and features are clustered. To tackle this challenge and unexplored problem, in this paper we introduce an approach, called Multiple Co-Clusterings (MultiCC), to discover non-redundant alternative co-clusterings. MultiCC makes use of matrix tri-factorization to optimize the sample-wise and feature-wise co-clustering indicator matrices, and introduces two non-redundancy terms to enforce diversity among co-clusterings. We then combine the objective of matrix tri-factorization and two non-redundancy terms into a unified objective function and introduce an iterative solution to optimize the function. Experimental results show that MultiCC outperforms existing multiple clustering methods, and it can find interesting co-clusters which cannot be discovered by current solutions.

[1]  Chin-Teng Lin,et al.  A review of clustering techniques and developments , 2017, Neurocomputing.

[2]  Sen Yang,et al.  Non-redundant multiple clustering by nonnegative matrix factorization , 2016, Machine Learning.

[3]  Rich Caruana,et al.  Meta Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[6]  MengChu Zhou,et al.  An Efficient Non-Negative Matrix-Factorization-Based Approach to Collaborative Filtering for Recommender Systems , 2014, IEEE Transactions on Industrial Informatics.

[7]  Jing Zhao,et al.  Document Clustering Based on Nonnegative Sparse Matrix Factorization , 2005, ICNC.

[8]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[9]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Fillia Makedon,et al.  Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering , 2011, IJCAI.

[12]  Ian Davidson,et al.  Finding Alternative Clusterings Using Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[14]  Michael I. Jordan,et al.  Iterative Discovery of Multiple AlternativeClustering Views , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[16]  James Bailey,et al.  Alternative Clustering Analysis: A Review , 2018, Data Clustering: Algorithms and Applications.

[17]  Jian Pei,et al.  Finding multiple stable clusterings , 2016, Knowledge and Information Systems.

[18]  Inderjit S. Dhillon,et al.  Simultaneous Unsupervised Learning of Disparate Clusterings , 2008, Stat. Anal. Data Min..

[19]  Meland,et al.  THE USE OF MOLECULAR PROFILING TO PREDICT SURVIVAL AFTER CHEMOTHERAPY FOR DIFFUSE LARGE-B-CELL LYMPHOMA , 2002 .

[20]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[23]  James Bailey,et al.  COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  Ying Cui,et al.  Non-redundant Multi-view Clustering via Orthogonalization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[25]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.