论文信息 - Applying Discrete PCA in Data Analysis

Applying Discrete PCA in Data Analysis

Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some common statistical tasks. We show that these methods can be interpreted as a discrete version of ICA. We develop a hierarchical version yielding components at different levels of detail, and additional techniques for Gibbs sampling. We compare the algorithms on a text prediction task using support vector machines, and to information retrieval.

Aleks Jakulin | Wray L. Buntine | Aleks Jakulin

[1] M. Woodbury,et al. A New Procedure for Analysis of Medical Classification , 1982, Methods of Information in Medicine.

[2] Joachim M. Buhmann,et al. Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[4] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .

[5] Zoubin Ghahramani,et al. Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[6] P. Donnelly,et al. Inference of population structure using multilocus genotype data. , 2000, Genetics.

[7] Wray L. Buntine. Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[8] Tom Minka,et al. Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[9] Ivan Bratko,et al. Analyzing Attribute Dependencies , 2003, PKDD.

[10] W. Bruce Croft,et al. Language Modeling for Information Retrieval , 2010, The Springer International Series on Information Retrieval.

[11] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..