Binary principal component analysis in the Netflix collaborative filtering task

We propose an algorithm for binary principal component analysis (PCA) that scales well to very high dimensional and very sparse data. Binary PCA finds components from data assuming Bernoulli distributions for the observations. The probabilistic approach allows for straightforward treatment of missing values. An example application is collaborative filtering using the Netflix data. The results are comparable with those reported for single methods in the literature and through blending we are able to improve our previously obtained best result with PCA.