Dirichlet Aspect Weighting: A Generalized EM Algorithm for Integrating External Data Fields with Semantically Structured Queries by Using Gradient Projection Method

In this paper we address the problem of document retrieval with semantically structured queries - queries where each term has a tagged field label. We introduce Dirichlet Aspect Weighting model which integrates terms from external databases into the query language model in a bayesian learning framework. For this model, the Dirichlet prior distribution is governed by parameters which depend on the number of fields in the external databases. This model needs additional examples to be augmented to the semantically structured query. These examples are obtained using pseudo relevance feedback. We formulate a loglikelihood function for the Dirichlet Aspect Weighting model and maximize it using a novel Generalized EM algorithm. Comparison of the results of Dirichlet Aspect Weighting model on TREC 2005 Genomics Track dataset with baseline methods using pseudo relevance feedback, while incorporating terms from external databases shows an improvement.

[1]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[2]  William R. Hersh,et al.  TREC GENOMICS Track Overview , 2003, TREC.

[3]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[4]  Mounia Lalmas,et al.  Report on the INEX 2003 workshop , 2004, SIGF.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[7]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[8]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[9]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[10]  Robert W. Heath,et al.  Designing structured tight frames via an alternating projection method , 2005, IEEE Transactions on Information Theory.

[11]  S. Griffis EDITOR , 1997, Journal of Navigation.

[12]  Leon M. Hall,et al.  Special Functions , 1998 .

[13]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[14]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[15]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[16]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[17]  Rudolf Kruse,et al.  Relevance Feedback for Association Rules by Leveraging Concepts from Information Retrieval , 2007, SGAI Conf..