Low-Rank Boolean Matrix Approximation by Integer Programming

Low-rank approximations of data matrices are an important dimensionality reduction tool in machine learning and regression analysis. We consider the case of categorical variables, where it can be formulated as the problem of finding low-rank approximations to Boolean matrices. In this paper we give what is to the best of our knowledge the first integer programming formulation that relies on only polynomially many variables and constraints, we discuss how to solve it computationally and report numerical tests on synthetic and real-world data.

[1]  Ki Hang Kim Boolean matrix theory and applications , 1982 .

[2]  J. Orlin Contentment in graph theory: Covering graphs with cliques , 1977 .

[3]  Lawrence K. Saul,et al.  A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[4]  S. Mulaik Foundations of Factor Analysis , 1975 .

[5]  Vilém Vychodil,et al.  Formal Concepts as Optimal Factors in Boolean Factor Analysis: Implications and Experiments , 2007, CLA.

[6]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[7]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Alexander A. Frolov,et al.  Comparison of Seven Methods for Boolean Factor Analysis and Their Evaluation by Information Gain , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Gene H. Golub,et al.  Matrix computations , 1983 .

[10]  Hai Tao,et al.  Binary Principal Component Analysis , 2006, BMVC.

[11]  Vijayalakshmi Atluri,et al.  Optimal Boolean Matrix Decomposition: Application to Role Engineering , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Alexander A. Frolov,et al.  Boolean Factor Analysis by Attractor Neural Network , 2007, IEEE Transactions on Neural Networks.

[13]  Garth P. McCormick,et al.  Computability of global solutions to factorable nonconvex programs: Part I — Convex underestimating problems , 1976, Math. Program..

[14]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[15]  Jan de Leeuw,et al.  Principal component analysis of binary data by iterated singular value decomposition , 2006, Comput. Stat. Data Anal..

[16]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[17]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[18]  Jaideep Vaidya,et al.  RoleMiner: mining roles using subset enumeration , 2006, CCS '06.