Directional co-clustering

Co-clustering addresses the problem of simultaneous clustering of both dimensions of a data matrix. When dealing with high dimensional sparse data, co-clustering turns out to be more beneficial than one-sided clustering even if one is interested in clustering along one dimension only. Aside from being high dimensional and sparse, some datasets, such as document-term matrices, exhibit directional characteristics, and the $$L_2$$L2 normalization of such data, so that it lies on the surface of a unit hypersphere, is useful. Popular co-clustering assumptions such as Gaussian or Multinomial are inadequate for this type of data. In this paper, we extend the scope of co-clustering to directional data. We present Diagonal Block Mixture of Von Mises–Fisher distributions (dbmovMFs), a co-clustering model which is well suited for directional data lying on a unit hypersphere. By setting the estimate of the model parameters under the maximum likelihood (ML) and classification ML approaches, we develop a class of EM algorithms for estimating dbmovMFs from data. Extensive experiments, on several real-world datasets, confirm the advantage of our approach and demonstrate the effectiveness of our algorithms.

[1]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[2]  Richard Paap,et al.  A Bayesian approach to two-mode clustering , 2009 .

[3]  Hans-Hermann Bock,et al.  Information and Entropy in Cluster Analysis , 1994 .

[4]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[5]  H. Bozdogan,et al.  Akaike's Information Criterion and Recent Developments in Information Complexity. , 2000, Journal of mathematical psychology.

[6]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[8]  Joydeep Ghosh,et al.  Under Consideration for Publication in Knowledge and Information Systems Generative Model-based Document Clustering: a Comparative Study , 2003 .

[9]  M. Cugmas,et al.  On comparing partitions , 2015 .

[10]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[11]  Mohamed Nadif,et al.  Sparse Poisson Latent Block Model for Document Clustering , 2017, IEEE Transactions on Knowledge and Data Engineering.

[12]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[13]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[14]  G. McLachlan,et al.  Advances in Data Analysis and Classification , 2015 .

[15]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[16]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[17]  Tao Li,et al.  A general model for clustering binary data , 2005, KDD '05.

[18]  Mohamed Nadif,et al.  Co-clustering , 2013, Encyclopedia of Database Systems.

[19]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[20]  Mohamed Nadif,et al.  Diagonal latent block model for binary data , 2016, Statistics and Computing.

[21]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[23]  Mohamed Nadif,et al.  Graph modularity maximization as an effective method for co-clustering text data , 2016, Knowl. Based Syst..

[24]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[25]  Nial Friel,et al.  Block clustering with collapsed latent block models , 2010, Statistics and Computing.

[26]  Gérard Govaert,et al.  Model-Based Co-clustering for Continuous Data , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[27]  Gérard Govaert,et al.  Mutual information, phi-squared and model-based co-clustering for contingency tables , 2016, Advances in Data Analysis and Classification.

[28]  Mohamed Nadif,et al.  Model-based co-clustering for the effective handling of sparse data , 2017, Pattern Recognit..

[29]  Hans-Hermann Bock,et al.  Two-mode clustering methods: astructuredoverview , 2004, Statistical methods in medical research.

[30]  Jun Wang,et al.  Generalizing DTW to the multi-dimensional case requires an adaptive approach , 2016, Data Mining and Knowledge Discovery.

[31]  Bryan Silverthorn,et al.  Spherical Topic Models , 2010, ICML.

[32]  G. Celeux,et al.  A stochastic approximation type EM algorithm for the mixture problem , 1992 .

[33]  Vichi Maurizio Double k-means Clustering for Simultaneous Classification of Objects and Variables , 2001 .

[34]  Mohamed Nadif,et al.  Social regularized von Mises–Fisher mixture model for item recommendation , 2017, Data Mining and Knowledge Discovery.

[35]  Blaise Hanczar,et al.  Bagging for Biclustering: Application to Microarray Data , 2010, ECML/PKDD.

[36]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[37]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[38]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[39]  Mohamed Nadif,et al.  Co-clustering for Binary and Categorical Data with Maximum Modularity , 2011, 2011 IEEE 11th International Conference on Data Mining.

[40]  Yiming Yang,et al.  Von Mises-Fisher Clustering Models , 2014, ICML.

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .