Matrix factorization with binary components

Motivated by an application in computational biology, we consider low-rank matrix factorization with $\{0,1\}$-constraints on one of the factors and optionally convex constraints on the second one. In addition to the non-convexity shared with other matrix factorization schemes, our problem is further complicated by a combinatorial constraint set of size $2^{m \cdot r}$, where $m$ is the dimension of the data points and $r$ the rank of the factorization. Despite apparent intractability, we provide - in the line of recent work on non-negative matrix factorization by Arora et al. (2012) - an algorithm that provably recovers the underlying factorization in the exact case with $O(m r 2^r + mnr + r^2 n)$ operations for $n$ datapoints. To obtain this result, we use theory around the Littlewood-Offord lemma from combinatorics.

[1]  P. Erdös On a lemma of Littlewood and Offord , 1945 .

[2]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[3]  E. Szemerédi,et al.  On the probability that a random ±1-matrix is singular , 1995 .

[4]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[5]  Alle-Jan van der Veen,et al.  Analytical method for blind binary signal separation , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[6]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  Lawrence K. Saul,et al.  A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[9]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Daphne Koller,et al.  Decomposing Gene Expression into Cellular Processes , 2002, Pacific Symposium on Biocomputing.

[11]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[12]  Joydeep Ghosh,et al.  Model-based overlapping clustering , 2005, KDD '05.

[13]  Zoubin Ghahramani,et al.  Modeling Dyadic Data with Binary Latent Factors , 2006, NIPS.

[14]  Chris H. Q. Ding,et al.  Binary Matrix Factorization with Applications , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[16]  Ata Kabán,et al.  Factorisation and denoising of 0-1 data: A variational approach , 2008, Neurocomputing.

[17]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[18]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[20]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[21]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[22]  Terence Tao,et al.  The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi , 2010, Comb..

[23]  Joel A. Tropp,et al.  Factoring nonnegative matrices with linear programs , 2012, NIPS.

[24]  Sanjeev Arora,et al.  Computing a nonnegative matrix factorization -- provably , 2011, STOC '12.

[25]  Lei Xu,et al.  Transcription Network Analysis by A Sparse Binary Factor Analysis Algorithm , 2012, J. Integr. Bioinform..

[26]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[27]  V. Vu,et al.  Small Ball Probability, Inverse Theorems, and Applications , 2012, 1301.0019.