Learning from missing data with the Latent Block Model

Missing data can be informative. Ignoring this information can lead to misleading conclusions when the data model does not allow information to be extracted from the missing data. We propose a co-clustering model, based on the Latent Block Model, that aims to take advantage of this nonignorable nonresponses, also known as Missing Not At Random data (MNAR). A variational expectation-maximization algorithm is derived to perform inference and a model selection criterion is presented. We assess the proposed approach on a simulation study, before using our model on the voting records from the lower house of the French Parliament, where our analysis brings out relevant groups of MPs and texts, together with a sensible interpretation of the behavior of non-voters.

[1]  Gérard Govaert,et al.  Block clustering with Bernoulli mixture models: Comparison of different approaches , 2008, Comput. Stat. Data Anal..

[2]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[3]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[4]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.

[5]  Luciano Cagnolo,et al.  Uniting pattern and process in plant-animal mutualistic networks: a review. , 2009, Annals of botany.

[6]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[7]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[8]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[9]  Mohamed Nadif,et al.  Co-clustering for Binary and Categorical Data with Maximum Modularity , 2011, 2011 IEEE 11th International Conference on Data Mining.

[10]  Arindam Banerjee,et al.  Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Gilles Celeux,et al.  EM for mixtures , 2015, Stat. Comput..

[12]  Richard S. Zemel,et al.  Recommender Systems, Missing Data and Statistical Model Estimation , 2011, IJCAI.

[13]  Alistair Black,et al.  Introduction , 2004, Libr. Trends.

[14]  Christophe Biernacki,et al.  Textual data summarization using the Self-Organized Co-Clustering model , 2020, Pattern Recognit..

[15]  Seungjin Choi,et al.  Bayesian binomial mixture model for collaborative prediction with non-random missing data , 2014, RecSys '14.

[16]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[17]  Jesús S. Aguilar-Ruiz,et al.  Biclustering on expression data: A review , 2015, J. Biomed. Informatics.

[18]  Julien Jacques,et al.  Model-based co-clustering for ordinal data , 2017, Comput. Stat. Data Anal..

[19]  Harald Steck,et al.  Training and testing of recommender systems on data missing not at random , 2010, KDD.

[20]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[21]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[22]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[23]  Julien Jacques,et al.  Model-based co-clustering for mixed type data , 2020, Comput. Stat. Data Anal..

[24]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[25]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[26]  Richard S. Zemel,et al.  Collaborative Filtering and the Missing at Random Assumption , 2007, UAI.

[27]  Charles Bouveyron,et al.  Co-Clustering of Ordinal Data via Latent Continuous Random Variables and Not Missing at Random Entries , 2020 .

[28]  Zoubin Ghahramani,et al.  Probabilistic Matrix Factorization with Non-random Missing Data , 2014, ICML.

[29]  Aurore Lomet,et al.  Sélection de modèle pour la classification croisée de données continues , 2013 .

[30]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[31]  Michael J. Brusco,et al.  Examining the effect of initialization strategies on the performance of Gaussian mixture modeling , 2015, Behavior Research Methods.

[32]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[33]  R. Horgan,et al.  Statistical Field Theory , 2014 .

[34]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[35]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[36]  Gérard Govaert,et al.  Estimation and selection for the latent block model on categorical data , 2015, Stat. Comput..

[37]  Nial Friel,et al.  Block clustering with collapsed latent block models , 2010, Statistics and Computing.

[38]  G. Govaert,et al.  Latent Block Model for Contingency Table , 2010 .

[39]  Pierre Barbillon,et al.  Variational Inference for Stochastic Block Models From Sampled Data , 2017, Journal of the American Statistical Association.

[40]  Gérard Govaert,et al.  Model selection for the binary latent block model , 2012 .