A new dual wing harmonium model for document retrieval

A new dual wing harmonium model that integrates term frequency features and term connection features into a low dimensional semantic space without increase of computation load is proposed for the application of document retrieval. Terms and vectorized graph connectionists are extracted from the graph representation of document by employing weighted feature extraction method. We then develop a new dual wing harmonium model projecting these multiple features into low dimensional latent topics with different probability distributions assumption. Contrastive divergence algorithm is used for efficient learning and inference. We perform extensive experimental verification, and the comparative results suggest that the proposed method is accurate and computationally efficient for document retrieval.

[1]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[2]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[5]  Jen-Tzung Chien,et al.  Adaptive Bayesian Latent Semantic Analysis , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[7]  Abraham Kandel,et al.  Classification Of Web Documents Using Graph Matching , 2004, Int. J. Pattern Recognit. Artif. Intell..

[8]  Michael I. Jordan,et al.  A generalized mean field algorithm for variational inference in exponential families , 2002, UAI.

[9]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[10]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[11]  Abraham Kandel,et al.  Classification of Web documents using a graph model , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  S N Dorogovtsev,et al.  Language as an evolving word web , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[14]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[15]  Zoubin Ghahramani,et al.  Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms , 2004, UAI.

[16]  Rong Yan,et al.  Harmonium Models for Video Classification , 2008 .

[17]  Geoffrey E. Hinton,et al.  A New Learning Algorithm for Mean Field Boltzmann Machines , 2002, ICANN.

[18]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[19]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[20]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[21]  Tommy W. S. Chow,et al.  Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection , 2009, IEEE Transactions on Neural Networks.

[22]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.