Model order selection for approximate Boolean matrix factorization problem

Abstract A key step in applying Boolean matrix factorization (BMF) is establishing the correct model order for the data, i.e., decide where the knowledge stops and the noise starts, or simply, decide the proper number of factors that describe the data well. There are two main approaches to BMF, namely, Discrete Basis Problem (DBP) and Approximation Factorization Problem (AFP). Although the model order selection technique for DBP exists, there is no technique tailored for AFP. We show that the number of factors for DBP cannot be used in AFP, and we present a novel way, reflecting the nature of AFP, how to establish the proper number of factors. Moreover, we show that the number of factors established for AFP is – from a knowledge-representation viewpoint – better than that for DBP.

[1]  Dmitry I. Ignatov,et al.  Boolean Matrix Factorisation for Collaborative Filtering: An FCA-Based Approach , 2014, AIMSA.

[2]  Pauli Miettinen,et al.  MDL4BMF: Minimum Description Length for Boolean Matrix Factorization , 2014, TKDD.

[3]  Martin Trnecka,et al.  From-below Boolean matrix factorization algorithm based on MDL , 2019, Advances in Data Analysis and Classification.

[4]  Pauli Miettinen,et al.  Model order selection for boolean matrix factorization , 2011, KDD.

[5]  Ambuj K. Singh,et al.  Summarizing Network Processes with Network-Constrained Boolean Matrix Factorization , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[6]  Robert E. Tarjan,et al.  Fast exact and heuristic methods for role minimization problems , 2008, SACMAT '08.

[7]  Radim Belohlávek,et al.  Impact of Boolean factorization as preprocessing methods for classification of Boolean data , 2014, Annals of Mathematics and Artificial Intelligence.

[8]  Pauli Miettinen,et al.  Recent Developments in Boolean Matrix Factorization , 2020, IJCAI.

[9]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2008, IEEE Trans. Knowl. Data Eng..

[10]  Pauli Miettinen,et al.  Matrix Decomposition Methods for Data Mining : Computational Complexity and Algorithms , 2009 .

[11]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[12]  Radim Belohlávek,et al.  Toward quality assessment of Boolean matrix factorizations , 2018, Inf. Sci..

[13]  Radim Belohlávek,et al.  From-below approximations in Boolean matrix factorization: Geometry and new algorithm , 2015, J. Comput. Syst. Sci..

[14]  S. Knuutila,et al.  DNA copy number amplification profiling of human neoplasms , 2006, Oncogene.

[15]  B. Hall,et al.  Growth rates made easy. , 2014, Molecular biology and evolution.

[16]  Martin Trnecka,et al.  Data Reduction for Boolean Matrix Factorization Algorithms Based on Formal Concept Analysis , 2018, Knowl. Based Syst..