From-below approximations in Boolean matrix factorization: Geometry and new algorithm

Abstract We present new results on Boolean matrix factorization and a new algorithm based on these results. The results emphasize the significance of factorizations that provide from-below approximations of the input matrix. While the previously proposed algorithms do not consider the possibly different significance of different matrix entries, our results help measure such significance and suggest where to focus when computing factors. An experimental evaluation of the new algorithm on both synthetic and real data demonstrates its good performance in terms of good coverage by the first few factors as well as a small number of factors needed for an almost exact decomposition and indicates that the algorithm outperforms the available ones in these terms. We also propose future research topics.

[1]  Cynthia Vera Glodeanu,et al.  Optimal Factorization of Three-Way Binary Data Using Triadic Concepts , 2013, Order.

[2]  Pauli Miettinen,et al.  Boolean Tensor Factorizations , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3]  Brian A. Davey,et al.  An Introduction to Lattices and Order , 1989 .

[4]  S. Knuutila,et al.  DNA copy number amplification profiling of human neoplasms , 2006, Oncogene.

[5]  Salvatore Orlando,et al.  Mining Top-K Patterns from Binary Datasets in Presence of Noise , 2010, SDM.

[6]  Vijayalakshmi Atluri,et al.  Optimal Boolean Matrix Decomposition: Application to Role Engineering , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Gunther Schmidt,et al.  Relational Mathematics , 2010, Encyclopedia of Mathematics and its Applications.

[8]  Vijayalakshmi Atluri,et al.  Constraint-Aware Role Mining via Extended Boolean Matrix Decomposition , 2012, IEEE Transactions on Dependable and Secure Computing.

[9]  Jan Outrata,et al.  Boolean Factor Analysis for Data Preprocessing in Machine Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[10]  Pauli Miettinen Sparse Boolean Matrix Factorizations , 2010, 2010 IEEE International Conference on Data Mining.

[11]  Dana S. Nau,et al.  A mathematical analysis of human leukocyte antigen serology , 1978 .

[12]  Ki Hang Kim Boolean matrix theory and applications , 1982 .

[13]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2008, IEEE Trans. Knowl. Data Eng..

[14]  Bernhard Ganter,et al.  Formal Concept Analysis , 2013 .

[15]  Vijayalakshmi Atluri,et al.  The role mining problem: finding a minimal descriptive set of roles , 2007, SACMAT '07.

[16]  Pauli Miettinen,et al.  Model order selection for boolean matrix factorization , 2011, KDD.

[17]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[18]  Robert E. Tarjan,et al.  Fast exact and heuristic methods for role minimization problems , 2008, SACMAT '08.

[19]  Yang Xiang,et al.  Summarizing transactional databases with overlapped hyperrectangles , 2011, Data Mining and Knowledge Discovery.

[20]  Ruggero G. Pensa,et al.  Constraint-Based Mining of Fault-Tolerant Patterns from Boolean Data , 2005, KDID.

[21]  Aristides Gionis,et al.  What is the Dimension of Your Binary Data? , 2006, Sixth International Conference on Data Mining (ICDM'06).

[22]  Bernhard Ganter,et al.  Ordinal Factor Analysis , 2012, ICFCA.

[23]  Pauli Miettinen,et al.  The Boolean column and column-row matrix decompositions , 2008, Data Mining and Knowledge Discovery.

[24]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[25]  Radim Belohlávek,et al.  Optimal decompositions of matrices with entries from residuated lattices , 2012, J. Log. Comput..

[26]  Sergei O. Kuznetsov,et al.  Comparing performance of algorithms for generating concept lattices , 2002, J. Exp. Theor. Artif. Intell..