论文信息 - Matrix factorization with binary components

Matrix factorization with binary components

Motivated by an application in computational biology, we consider low-rank matrix factorization with $\{0,1\}$-constraints on one of the factors and optionally convex constraints on the second one. In addition to the non-convexity shared with other matrix factorization schemes, our problem is further complicated by a combinatorial constraint set of size $2^{m \cdot r}$, where $m$ is the dimension of the data points and $r$ the rank of the factorization. Despite apparent intractability, we provide - in the line of recent work on non-negative matrix factorization by Arora et al. (2012) - an algorithm that provably recovers the underlying factorization in the exact case with $O(m r 2^r + mnr + r^2 n)$ operations for $n$ datapoints. To obtain this result, we use theory around the Littlewood-Offord lemma from combinatorics.

[1] P. Erdös. On a lemma of Littlewood and Offord , 1945 .

[2] P. Paatero,et al. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[3] E. Szemerédi,et al. On the probability that a random ±1-matrix is singular , 1995 .

[4] Ming Gu,et al. Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[5] Alle-Jan van der Veen,et al. Analytical method for blind binary signal separation , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[6] T. P. Dinh,et al. Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[7] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8] Lawrence K. Saul,et al. A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[9] Chiara Sabatti,et al. Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10] Daphne Koller,et al. Decomposing Gene Expression into Cellular Processes , 2002, Pacific Symposium on Biocomputing.

[11] Victoria Stodden,et al. When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[12] Joydeep Ghosh,et al. Model-based overlapping clustering , 2005, KDD '05.

[13] Zoubin Ghahramani,et al. Modeling Dyadic Data with Binary Latent Factors , 2006, NIPS.

[14] Chris H. Q. Ding,et al. Binary Matrix Factorization with Applications , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15] Chih-Jen Lin,et al. Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[16] Ata Kabán,et al. Factorisation and denoising of 0-1 data: A variational approach , 2008, Neurocomputing.

[17] Jean Ponce,et al. Convex Sparse Matrix Factorizations , 2008, ArXiv.

[18] Pauli Miettinen,et al. The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19] R. Tibshirani,et al. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[20] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[21] Hans-Georg Müller,et al. Functional Data Analysis , 2016 .

[22] Terence Tao,et al. The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi , 2010, Comb..

[23] Joel A. Tropp,et al. Factoring nonnegative matrices with linear programs , 2012, NIPS.

[24] Sanjeev Arora,et al. Computing a nonnegative matrix factorization -- provably , 2011, STOC '12.

[25] Lei Xu,et al. Transcription Network Analysis by A Sparse Binary Factor Analysis Algorithm , 2012, J. Integr. Bioinform..

[26] Devin C. Koestler,et al. DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[27] V. Vu,et al. Small Ball Probability, Inverse Theorems, and Applications , 2012, 1301.0019.