论文信息 - Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data

We consider the problem of unsupervised learning from a matrix of data vectors where in each row the observed values are randomly permuted in an unknown fashion. Such problems arise naturally in areas such as computer vision and text modeling where measurements need not be in correspondence with the correct features. We provide a general theoretical characterization of the difficulty of "unscrambling" the values of the rows for such problems and relate the optimal error rate to the well-known concept of the Bayes classification error rate. For known parametric distributions we derive closed-form expressions for the optimal error rate that provide insight into what makes this problem difficult in practice. Finally, we show how the Expectation-Maximization procedure can be used to simultaneously estimate both a probabilistic model for the features as well as a distribution over the correspondence of the row values.

Sridevi Parise | Padhraic Smyth | Sergey Kirshner

[1] Chandrika Kamath,et al. Learning to Classify Galaxy Shapes Using the EM Algorithm , 2002, NIPS.

[2] B. Frey,et al. Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[4] Andrew McCallum,et al. Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[5] Eric Mjolsness,et al. New Algorithms for 2D and 3D Point Matching: Pose Estimation and Correspondence , 1998, NIPS.