DISTRIBUTION-BASED ESTIMATION OF THE LATENT VARIABLES AND ITS ACCURACY

Hierarchical probabilistic models such as a mixture of distributions and a hidden Markov model are widely used for unsupervised learning. They consist of the observable and the latent variables, which represent the observed data and the underlying data-generating process, respectively. There are two type of use due to the estimated variable; the prediction of future/unseen data is the observablevariable (OV) estimation, and the analysis how the given data were generated is the latent-variable (LV) estimation. The asymptotic accuracy of the OV estimation has been elucidated in many models. On the other hand, the LV estimation has not sufficiently been studied. In this talk, the error function to measure the accuracy of the LV estimation is formulated, and its asymptotic form is derived for the maximum-likelihood (ML) and the Bayes methods. The results provide a distribution-based evaluation of the unsupervised learning, and show that the Bayes method has the better accuracy than the ML method.