Bias aware probabilistic Boolean matrix factorization

Boolean matrix factorization (BMF) is a combi-natorial problem arising from a wide range of applications including recommendation system, col-laborative filtering, and dimensionality reduction. Currently, the noise model of existing BMF methods is often assumed to be homoscedastic; however, in real world data scenarios, the deviations of observed data from their true values are almost surely diverse due to stochastic noises, making each data point not equally suitable for fitting a model. In this case, it is not ideal to treat all data points as equally distributed. Motivated by such observations, we introduce a probabilistic BMF model that recognizes the object- and feature-wise bias distribution respectively, called bias aware BMF (BABF). To the best of our knowledge, BABF is the first approach for Boolean decomposition with consideration of the feature-wise and object-wise bias in binary data. We conducted experiments on datasets with different levels of background noise, bias level, and sizes of the signal patterns, to test the effectiveness of our method in various scenarios. We demonstrated that our model outperforms the state-of-the-art factorization methods in both accuracy and efficiency in recovering the original datasets, and the inferred bias level is highly sig-nificantly correlated with true existing bias in both simulated and real world datasets.

[1]  Changlin Wan,et al.  A data denoising approach to optimize functional clustering of single cell RNA-sequencing data , 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  O. Günlük,et al.  Binary Matrix Factorisation via Column Generation , 2020, AAAI.

[3]  Chi Zhang,et al.  Denoising Individual Bias for Fairer Binary Submatrix Detection , 2020, International Conference on Information and Knowledge Management.

[4]  Pauli Miettinen,et al.  Recent Developments in Boolean Matrix Factorization , 2020, IJCAI.

[5]  Zhaohui Wu,et al.  Understanding Smartphone Users From Installed App Lists Using Boolean Matrix Factorization , 2020, IEEE Transactions on Cybernetics.

[6]  Lifan Liang,et al.  BEM: Mining Coregulation Patterns in Transcriptomics via Boolean Matrix Factorization , 2020, Bioinform..

[7]  Chi Zhang,et al.  Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation , 2019, AAAI.

[8]  Radim Belohlávek,et al.  Factorizing Boolean matrices using formal concepts and iterative usage of essential entries , 2019, Inf. Sci..

[9]  Pauli Miettinen,et al.  Boolean matrix factorization meets consecutive ones property , 2019, SDM.

[10]  Ambuj K. Singh,et al.  Summarizing Network Processes with Network-Constrained Boolean Matrix Factorization , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[11]  Radim Belohlávek,et al.  A new algorithm for Boolean matrix factorization which admits overcovering , 2018, Discret. Appl. Math..

[12]  Yu Zhang,et al.  LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data , 2019, Nucleic acids research.

[13]  Chau Yuen,et al.  People to People Recommendation using Coupled Nonnegative Boolean Matrix Factorization , 2018, 2018 International Conference on Soft-computing and Network Security (ICSNS).

[14]  Shawn M. Gillespie,et al.  Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer , 2017, Cell.

[15]  Christopher Yau,et al.  Bayesian Boolean Matrix Factorisation , 2017, ICML.

[16]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[17]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[18]  Nicolas Gillis,et al.  On the Complexity of Robust PCA and ℓ1-norm Low-Rank Matrix Approximation , 2015, Math. Oper. Res..

[19]  Barnabás Póczos,et al.  Boolean Matrix Factorization and Noisy Completion via Message Passing , 2015, ICML.

[20]  Salvatore Orlando,et al.  A Unifying Framework for Mining Approximate Top- $k$ Binary Patterns , 2014, IEEE Transactions on Knowledge and Data Engineering.

[21]  E. Levanon,et al.  Human housekeeping genes, revisited. , 2013, Trends in genetics : TIG.

[22]  Radim Belohlávek,et al.  From-below approximations in Boolean matrix factorization: Geometry and new algorithm , 2013, J. Comput. Syst. Sci..

[23]  Pauli Miettinen,et al.  Model order selection for boolean matrix factorization , 2011, KDD.

[24]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  Y. Selen,et al.  Model-order selection: a review of information criterion rules , 2004, IEEE Signal Processing Magazine.

[26]  Martin Wiedmann,et al.  The Organizing Principle in the Formation of the T Cell Receptor-CD3 Complex , 2002, Cell.

[27]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[28]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[29]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[30]  Pauli Miettinen,et al.  Getting to Know the Unknown Unknowns: Destructive-Noise Resistant Boolean Matrix Factorization , 2015, SDM.

[31]  Salvatore Orlando,et al.  Mining Top-K Patterns from Binary Datasets in Presence of Noise , 2010, SDM.