Relationship between class order and parameter approximation in unsupervised learning

Due to the growing number of unlabeled documents, it is becoming important to develop unsupervised methods capable of automatically extracting information. Topic models and neural networks represent two such methods, and parameter approximation algorithms are typically employed to estimate the parameters because it is not possible precisely to compute the parameters when using these methods. One of the well-known weaknesses of these approximation algorithms is that they do not find the global optimum but instead find one of many local optima. It is also known that initialization of the parameters affects the results of the parameter approximation process. In this paper, we hypothesize that the order of data class is also a factor that affects the parameter approximation results. Through digit recognition experiments with MNIST data, we prove that this hypothesis is valid and argue that it will be better always to use fully shuffled data to avoid incorrect conclusions.

[1]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[2]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[5]  Pascal Vincent,et al.  Parallel Tempering for Training of Restricted Boltzmann Machines , 2010 .

[6]  Huidong Jin,et al.  Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document , 2010, 2010 IEEE International Conference on Data Mining.

[7]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[11]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[12]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[13]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[14]  Ho-Jin Choi,et al.  Sequential Entity Group Topic Model for Getting Topic Flows of Entity Groups within One Document , 2012, PAKDD.