Tensor latent block model for co-clustering

With the exponential growth of collected data in different fields like recommender system (user, items), text mining (document, term), bioinformatics (individual, gene), co-clustering, which is a simultaneous clustering of both dimensions of a data matrix, has become a popular technique. Co-clustering aims to obtain homogeneous blocks leading to a straightforward simultaneous interpretation of row clusters and column clusters. Many approaches exist; in this paper, we rely on the latent block model (LBM), which is flexible, allowing to model different types of data matrices. We extend its use to the case of a tensor (3D matrix) data in proposing a Tensor LBM (TLBM), allowing different relations between entities. To show the interest of TLBM, we consider continuous, binary, and contingency tables datasets. To estimate the parameters, a variational EM algorithm is developed. Its performances are evaluated on synthetic and real datasets to highlight different possible applications.

[1]  Tao Wu,et al.  General Tensor Spectral Co-clustering for Higher-Order Data , 2016, NIPS.

[2]  Maja Pantic,et al.  TensorLy: Tensor Learning in Python , 2016, J. Mach. Learn. Res..

[3]  Mohamed Nadif,et al.  A Unified Framework for Data Visualization and Coclustering , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Mohamed Nadif,et al.  Sparse Poisson Latent Block Model for Document Clustering , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[8]  Jing Lv,et al.  Spatial-aware hyperspectral image classification via multifeature kernel dictionary learning , 2019, International Journal of Data Science and Analytics.

[9]  Mohamed Nadif,et al.  CoClust: A Python Package for Co-Clustering , 2019, Journal of Statistical Software.

[10]  Mohamed Nadif,et al.  Co-clustering from Tensor Data , 2019, PAKDD.

[11]  Fabrice Rossi,et al.  Co-clustering Based Exploratory Analysis of Mixed-Type Data Tables , 2022, EGC.

[12]  Gérard Govaert,et al.  Comparison of the mixture and the classification maximum likelihood in cluster analysis with binary data , 1996 .

[13]  Gérard Govaert,et al.  Block clustering with Bernoulli mixture models: Comparison of different approaches , 2008, Comput. Stat. Data Anal..

[14]  Joydeep Ghosh,et al.  Model-based overlapping clustering , 2005, KDD '05.

[15]  Mohamed Nadif,et al.  Model-based von Mises-Fisher Co-clustering with a Conscience , 2017, SDM.

[16]  Mohamed Nadif,et al.  Fuzzy clustering to estimate the parameters of block mixture models , 2006, Soft Comput..

[17]  Gérard Govaert,et al.  An EM algorithm for the block mixture model , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Mohamed Nadif,et al.  Directional co-clustering , 2019, Adv. Data Anal. Classif..

[19]  David Tse,et al.  Tensor Biclustering , 2017, NIPS.

[20]  Mohamed Nadif,et al.  Co-clustering under Nonnegative Matrix Tri-Factorization , 2011, ICONIP.

[21]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[22]  Murray Aitkin,et al.  Variational algorithms for biclustering models , 2015, Comput. Stat. Data Anal..

[23]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[24]  Gérard Govaert,et al.  Clustering with block mixture models , 2003, Pattern Recognit..

[25]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[26]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[27]  Mohamed Nadif,et al.  Co-clustering , 2013, Encyclopedia of Database Systems.

[28]  Etienne Côme,et al.  A mixture model clustering approach for temporal passenger pattern characterization in public transport , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[29]  Gérard Govaert,et al.  Mutual information, phi-squared and model-based co-clustering for contingency tables , 2016, Advances in Data Analysis and Classification.

[30]  Mohamed Nadif,et al.  Model-based co-clustering for the effective handling of sparse data , 2017, Pattern Recognit..

[31]  François Bourgeois,et al.  An extension of the Munkres algorithm for the assignment problem to rectangular matrices , 1971, CACM.

[32]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[33]  Mehmet M. Dalkilic,et al.  Using data to build a better EM: EM* for big data , 2017, International Journal of Data Science and Analytics.

[34]  J. Pagès Multiple Factor Analysis by Example Using R , 2014 .

[35]  Rekhil M Kumar A Survey on Image Feature Descriptors , 2014 .