Boolean matrix factorization meets consecutive ones property

Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well.

[1]  Andrea Passerini,et al.  PTRcombiner: mining combinatorial regulation of gene expression from post-transcriptional interaction maps , 2014, BMC Genomics.

[2]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[3]  Pauli Miettinen,et al.  Discovering relations using matrix factorization methods , 2013, CIKM.

[4]  Nicolas Gillis,et al.  On the Complexity of Robust PCA and ℓ1-norm Low-Rank Matrix Approximation , 2015, Math. Oper. Res..

[5]  Katharina Morik,et al.  C-SALT: Mining Class-Specific ALTerations in Boolean Matrix Factorization , 2017, ECML/PKDD.

[6]  Bruce Hendrickson,et al.  A Spectral Algorithm for Seriation and the Consecutive Ones Problem , 1999, SIAM J. Comput..

[7]  Kellogg S. Booth,et al.  Testing for the Consecutive Ones Property, Interval Graphs, and Graph Planarity Using PQ-Tree Algorithms , 1976, J. Comput. Syst. Sci..

[8]  Pauli Miettinen,et al.  Getting to Know the Unknown Unknowns: Destructive-Noise Resistant Boolean Matrix Factorization , 2015, SDM.

[9]  Barnabás Póczos,et al.  Boolean Matrix Factorization and Noisy Completion via Message Passing , 2015, ICML.

[10]  Katharina Morik,et al.  The PRIMPING routine—Tiling through proximal alternating linearized minimization , 2017, Data Mining and Knowledge Discovery.

[11]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Aristides Gionis,et al.  Geometric and Combinatorial Tiles in 0-1 Data , 2004, PKDD.

[13]  Pauli Miettinen,et al.  MDL4BMF: Minimum Description Length for Boolean Matrix Factorization , 2014, TKDD.

[14]  Guy Van den Broeck On the Complexity and Approximation of Binary Evidence in Lifted Inference , 2013, StarAI@AAAI.

[15]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[16]  Christopher Yau,et al.  Bayesian Boolean Matrix Factorisation , 2017, ICML.

[17]  Claudia Plant,et al.  Ternary Matrix Factorization: problem definitions and algorithms , 2015, Knowledge and Information Systems.

[18]  Salvatore Orlando,et al.  A Unifying Framework for Mining Approximate Top- $k$ Binary Patterns , 2014, IEEE Transactions on Knowledge and Data Engineering.