Effects of additional data on Bayesian clustering

Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity.

[1]  Keisuke Yamazaki,et al.  Accuracy of latent-variable estimation in Bayesian semi-supervised learning , 2013, Neural Networks.

[2]  Keisuke Yamazaki,et al.  Asymptotic accuracy of distribution-based estimation of latent variables , 2012, J. Mach. Learn. Res..

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[5]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[6]  Keisuke Yamazaki,et al.  Accuracy analysis of semi-supervised classification when the class balance changes , 2015, Neurocomputing.

[7]  Rajat Raina,et al.  Abstract , 1997, Veterinary Record.

[8]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[9]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[10]  Keisuke Yamazaki Asymptotic accuracy of Bayes estimation for latent variables with redundancy , 2015, Machine Learning.

[11]  Jason Weston,et al.  Transductive Inference for Estimating Values of Functions , 1999, NIPS.

[12]  Keisuke Yamazaki,et al.  Accuracy of latent variable estimation with the maximum likelihood estimator for partially observed hidden data , 2012, 2012 International Symposium on Information Theory and its Applications.

[13]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[14]  Samuel Kaski,et al.  Learning from Relevant Tasks Only , 2007, ECML.

[15]  Sumio Watanabe,et al.  Equations of States in Singular Statistical Estimation , 2007, Neural Networks.

[16]  Sumio Watanabe Algebraic Analysis for Non-identifiable Learning Machines , 2000 .

[17]  Gang Niu,et al.  Class-prior estimation for learning from positive and unlabeled data , 2016, Machine Learning.

[18]  Sumio Watanabe,et al.  Algebraic Analysis for Nonidentifiable Learning Machines , 2001, Neural Computation.

[19]  Keisuke Yamazaki,et al.  On Bayesian Clustering with a Structured Gaussian Mixture , 2014, J. Adv. Comput. Intell. Intell. Informatics.

[20]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[21]  H. Akaike A new look at the statistical model identification , 1974 .

[22]  Samuel Kaski,et al.  An Analysis of Generalization Error in Relevant Subtask Learning , 2008, ICONIP.

[23]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[24]  Sumio Watanabe,et al.  Exchange Monte Carlo Sampling From Bayesian Posterior for Singular Learning Machines , 2008, IEEE Transactions on Neural Networks.

[25]  Y. Ogata A Monte Carlo method for an objective Bayesian procedure , 1990 .

[26]  Masashi Sugiyama,et al.  Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.

[27]  Thomas G. Dietterich,et al.  Transfer Learning with an Ensemble of Background Tasks , 2005, NIPS 2005.