Faster Algorithms for Binary Matrix Factorization

We give faster approximation algorithms for wellstudied variants of Binary Matrix Factorization (BMF), where we are given a binarym×nmatrix A and would like to find binary rank-k matrices U, V to minimize the Frobenius norm ofU ·V −A. In the first setting, U · V denotes multiplication over Z, and we give a constantfactor approximation algorithm that runs in 2 2 log poly(mn) time, improving upon the previous min(2 k , 2)poly(mn) time. Our techniques generalize to minimizing ‖U · V − A‖p for p ≥ 1, in 2O(k log poly(mn) time. For p = 1, this has a graph-theoretic consequence, namely, a 2 2)poly(mn)-time algorithm to approximate a graph as a union of disjoint bicliques. In the second setting, U · V is over GF(2), and we give a bicriteria constant-factor approximation algorithm that runs in 2 3)poly(mn) time to find binary rankO(k logm) matrices U , V whose cost is as good as the best rank-k approximation, improving upon min(2 k mn,min(m,n) O(1) poly(mn)) time.

[1]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[2]  David P. Woodruff,et al.  A PTAS for 𝓁p-Low Rank Approximation , 2019, SODA.

[3]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[4]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[5]  Heikki Mannila,et al.  A Simple Algorithm for Topic Identification in 0-1 Data , 2003, PKDD.

[6]  David P. Woodruff,et al.  Algorithms for ℓp Low Rank Approximation , 2017 .

[7]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[8]  Vijayalakshmi Atluri,et al.  Constraint-Aware Role Mining via Extended Boolean Matrix Decomposition , 2012, IEEE Transactions on Dependable and Secure Computing.

[9]  Fahad Panolan,et al.  Approximation Schemes for Low-rank Binary Matrix Approximation Problems , 2018, ACM Trans. Algorithms.

[10]  David P. Woodruff,et al.  An optimal algorithm for the distinct elements problem , 2010, PODS '10.

[11]  David P. Woodruff,et al.  Weighted low rank approximations with provable guarantees , 2016, STOC.

[12]  Chris H. Q. Ding,et al.  Binary Matrix Factorization with Applications , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[13]  Kristoffer Arnsfelt Hansen,et al.  On Low Rank Approximation of Binary Matrices , 2015, ArXiv.

[14]  L. Sunil Chandran,et al.  On the Parameterized Complexity of Biclique Cover and Partition , 2016, IPEC.

[15]  Milos Hauskrecht,et al.  Noisy-OR Component Analysis and its Application to Link Analysis , 2006, J. Mach. Learn. Res..

[16]  Jieping Ye,et al.  Mining discrete patterns via binary matrix factorization , 2009, KDD.

[17]  Stefan Neumann,et al.  Bipartite Stochastic Block Models with Tiny Clusters , 2018, NeurIPS.

[18]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  Arie Yeredor,et al.  ICA over finite fields - Separability and algorithms , 2012, Signal Process..

[20]  BARSHA MITRA,et al.  A Survey of Role Mining , 2016, ACM Comput. Surv..

[21]  Amichai Painsky,et al.  PhD Dissertation: Generalized Independent Components Analysis Over Finite Alphabets , 2018, 1809.05043.

[22]  Arie Yeredor,et al.  Independent Component Analysis Over Galois Fields of Prime Order , 2011, IEEE Transactions on Information Theory.

[23]  Richard Peng,et al.  Lp Row Sampling by Lewis Weights , 2015, STOC.

[24]  Peng Jiang,et al.  A Clustering Approach to Constrained Binary Matrix Factorization , 2014 .

[25]  Vijayalakshmi Atluri,et al.  The role mining problem: finding a minimal descriptive set of roles , 2007, SACMAT '07.

[26]  David P. Woodruff,et al.  Low rank approximation with entrywise l1-norm error , 2017, STOC.

[27]  Hong Sun,et al.  Binary Matrix Factorization and Consensus Algorithms , 2010, 2010 International Conference on Electrical and Control Engineering.

[28]  J. Orlin Contentment in graph theory: Covering graphs with cliques , 1977 .

[29]  Parinya Chalermsook,et al.  Nearly Tight Approximability Results for Minimum Biclique Cover and Partition , 2014, ESA.

[30]  Ananth Grama,et al.  PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets , 2003, KDD '03.

[31]  Daniël Paulusma,et al.  Covering graphs with few complete bipartite subgraphs , 2009, Theor. Comput. Sci..