Boolean Matrix Factorization and Noisy Completion via Message Passing

Boolean matrix factorization and Boolean matrix completion from noisy observations are desirable unsupervised data-analysis methods due to their interpretability, but hard to perform due to their NP-hardness. We treat these problems as maximum a posteriori inference problems in a graphical model and present a message passing approach that scales linearly with the number of observations and factors. Our empirical study demonstrates that message passing is able to recover low-rank Boolean matrices, in the boundaries of theoretically possible recovery and compares favorably with state-of-the-art in real-world applications, such collaborative filtering with large-scale Boolean data.

[1]  Aleks Jakulin,et al.  Analyzing the U.S. Senate in 2003: Similarities, Clusters, and Blocs , 2009, Political Analysis.

[2]  Florent Krzakala,et al.  Phase diagram and approximate message passing for blind calibration and dictionary learning , 2013, 2013 IEEE International Symposium on Information Theory.

[3]  Guy Van den Broeck On the Complexity and Approximation of Binary Evidence in Lifted Inference , 2013, StarAI@AAAI.

[4]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[5]  Russell Greiner,et al.  Augmentative Message Passing for Traveling Salesman Problem and Graph Partitioning , 2014, NIPS.

[6]  Russell Greiner,et al.  Revisiting Algebra and Complexity of Inference in Graphical Models , 2014, 1409.7410.

[7]  M. Bayati,et al.  Max-Product for Maximum Weight Matching: Convergence, Correctness, and LP Duality , 2008, IEEE Transactions on Information Theory.

[8]  Aristides Gionis,et al.  What is the Dimension of Your Binary Data? , 2006, Sixth International Conference on Data Mining (ICDM'06).

[9]  Dino Sejdinovic,et al.  Note on noisy group testing: Asymptotic bounds and belief propagation reconstruction , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Toshiyuki Tanaka,et al.  Low-rank matrix reconstruction and clustering via approximate message passing , 2013, NIPS.

[11]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[12]  Rahul Gupta,et al.  Efficient inference with cardinality-based clique potentials , 2007, ICML '07.

[13]  Volkan Cevher,et al.  Bilinear Generalized Approximate Message Passing , 2013, ArXiv.

[14]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[15]  Vijayalakshmi Atluri,et al.  Optimal Boolean Matrix Decomposition: Application to Role Engineering , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Chris H. Q. Ding,et al.  Binary Matrix Factorization with Applications , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[18]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[21]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[22]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[23]  Thomas L. Griffiths,et al.  A Non-Parametric Bayesian Method for Inferring Hidden Causes , 2006, UAI.

[24]  D. Du,et al.  Combinatorial Group Testing and Its Applications , 1993 .

[25]  George Atia,et al.  Boolean Compressed Sensing and Noisy Group Testing , 2009, IEEE Transactions on Information Theory.

[26]  Claudia Plant,et al.  Ternary Matrix Factorization: problem definitions and algorithms , 2014, Knowledge and Information Systems.

[27]  Jason D. M. Rennie,et al.  Loss Functions for Preference Levels: Regression with Discrete Ordered Labels , 2005 .

[28]  Marinka Zitnik,et al.  NIMFA: A Python Library for Nonnegative Matrix Factorization , 2012, J. Mach. Learn. Res..

[29]  Pauli Miettinen,et al.  Model order selection for boolean matrix factorization , 2011, KDD.

[30]  Riccardo Zecchina,et al.  Survey propagation: An algorithm for satisfiability , 2002, Random Struct. Algorithms.

[31]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[32]  Peter Kulchyski and , 2015 .

[33]  Chris H. Q. Ding,et al.  Binary matrix factorization for analyzing gene expression data , 2009, Data Mining and Knowledge Discovery.

[34]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[35]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[36]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[37]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[38]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[39]  Andrea Montanari,et al.  The Noise-Sensitivity Phase Transition in Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[40]  Roderick P. McDonald,et al.  Factor Analysis and Related Methods , 1985 .

[41]  Vijayalakshmi Atluri,et al.  The role mining problem: finding a minimal descriptive set of roles , 2007, SACMAT '07.

[42]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[43]  Milos Hauskrecht,et al.  Noisy-OR Component Analysis and its Application to Link Analysis , 2006, J. Mach. Learn. Res..

[44]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[45]  Volkan Cevher,et al.  Bilinear Generalized Approximate Message Passing—Part I: Derivation , 2013, IEEE Transactions on Signal Processing.

[46]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[47]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[48]  Václav Snásel,et al.  Binary Factor Analysis with Help of Formal Concepts , 2004, CLA.

[49]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[50]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[51]  Radim Belohlávek,et al.  Fast factorization by similarity in formal concept analysis of data with fuzzy attributes , 2007, J. Comput. Syst. Sci..

[52]  Yair Weiss,et al.  MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies , 2007, UAI.

[53]  Alexandros G. Dimakis,et al.  LDPC Codes for Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[54]  Andrea Montanari,et al.  Message passing algorithms for compressed sensing: I. motivation and construction , 2009, 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo).

[55]  Russell Greiner,et al.  Perturbed message passing for constraint satisfaction problems , 2014, J. Mach. Learn. Res..

[56]  Florent Krzakala,et al.  Phase Transitions and Sample Complexity in Bayes-Optimal Matrix Factorization , 2014, IEEE Transactions on Information Theory.

[57]  Devavrat Shah,et al.  Belief propagation for min-cost network flow: convergence & correctness , 2010, SODA '10.