Factorizing Boolean matrices using formal concepts and iterative usage of essential entries

Abstract We present a new algorithm for factorization of Boolean matrices (binary relations), i.e. for extraction of factors from relational data, which is based on a new insight into the geometry of factorizations. The algorithm exploits in an iterative manner so-called essential entries in relational data and outperforms, sometimes significantly, the available algorithms for exact and almost exact factorizations of relational data. We describe the rationale for the new approach, present our algorithm, provide its experimental evaluation, and present open problems.

[1]  S. Knuutila,et al.  DNA copy number amplification profiling of human neoplasms , 2006, Oncogene.

[2]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[3]  Vijayalakshmi Atluri,et al.  Optimal Boolean Matrix Decomposition: Application to Role Engineering , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  J. Orlin Contentment in graph theory: Covering graphs with cliques , 1977 .

[5]  W. Trotter,et al.  Combinatorics and Partially Ordered Sets: Dimension Theory , 1992 .

[6]  Salvatore Orlando,et al.  Mining Top-K Patterns from Binary Datasets in Presence of Noise , 2010, SDM.

[7]  Bernhard Ganter,et al.  Ordinal Factor Analysis , 2012, ICFCA.

[8]  Gunther Schmidt,et al.  Relational Mathematics , 2010, Encyclopedia of Mathematics and its Applications.

[9]  Radim Belohlávek,et al.  From-below approximations in Boolean matrix factorization: Geometry and new algorithm , 2015, J. Comput. Syst. Sci..

[10]  Ananth Grama,et al.  PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets , 2003, KDD '03.

[11]  Ki Hang Kim Boolean matrix theory and applications , 1982 .

[12]  Richard A. Brualdi,et al.  Combinatorial matrix theory , 1991 .

[13]  Dana S. Nau,et al.  A mathematical analysis of human leukocyte antigen serology , 1978 .

[14]  Vijayalakshmi Atluri,et al.  Constraint-Aware Role Mining via Extended Boolean Matrix Decomposition , 2012, IEEE Transactions on Dependable and Secure Computing.

[15]  Pauli Miettinen,et al.  Getting to Know the Unknown Unknowns: Destructive-Noise Resistant Boolean Matrix Factorization , 2015, SDM.

[16]  Jan Outrata,et al.  Boolean Factor Analysis for Data Preprocessing in Machine Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[17]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[18]  Robert E. Tarjan,et al.  Fast exact and heuristic methods for role minimization problems , 2008, SACMAT '08.

[19]  Yang Xiang,et al.  Summarizing transactional databases with overlapped hyperrectangles , 2011, Data Mining and Knowledge Discovery.

[20]  Radim Belohlávek,et al.  Impact of Boolean factorization as preprocessing methods for classification of Boolean data , 2014, Annals of Mathematics and Artificial Intelligence.

[21]  Jilles Vreeken,et al.  Comparing apples and oranges: measuring differences between exploratory data mining results , 2012, Data Mining and Knowledge Discovery.

[22]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2008, IEEE Trans. Knowl. Data Eng..

[23]  Vijayalakshmi Atluri,et al.  The role mining problem: finding a minimal descriptive set of roles , 2007, SACMAT '07.