Learning to rank with (a lot of) word features

In this article we present Supervised Semantic Indexing which defines a class of nonlinear (quadratic) models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained from a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as cross-language retrieval or online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, correlated feature hashing and sparsification. We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.

[1]  Editors , 1986, Brain Research Bulletin.

[2]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[3]  Thomas L. Griffiths,et al.  Advances in Neural Information Processing Systems 21 , 1993, NIPS 2009.

[4]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[5]  R McKeownKathleen,et al.  Translating collocations for bilingual lexicons , 1996 .

[6]  Michael L. Littman,et al.  Automatic Cross-Language Retrieval Using Latent Semantic Indexing , 1997 .

[7]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[8]  John D. Lafferty,et al.  Information Retrieval as Statistical Translation , 2017 .

[9]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[10]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[11]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[12]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[13]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[14]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[15]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Wei-Ying Ma,et al.  Supervised latent semantic indexing for document categorization , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.

[19]  Maria Ruiz-Casado,et al.  Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia , 2005, NLDB.

[20]  Samy Bengio,et al.  Inferring document similarity from hyperlinks , 2005, CIKM '05.

[21]  Samy Bengio,et al.  A Neural Network for Text Representation , 2005, ICANN.

[22]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[23]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[24]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[25]  Wolfgang Nejdl,et al.  Extracting Semantics Relationships between Wikipedia Categories , 2006, SemWiki.

[26]  Zheng Chen,et al.  Latent semantic analysis for multiple-type interrelated data objects , 2006, SIGIR.

[27]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[28]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[29]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[30]  Thomas Hofmann,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[31]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[32]  Lehel Csató,et al.  Wikipedia-Based Kernels for Text Categorization , 2007, Ninth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2007).

[33]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[34]  Amir Globerson,et al.  Visualizing pairwise similarity via semidefinite programming , 2007, AISTATS.

[35]  Sutanu Chakraborti,et al.  Supervised Latent Semantic Indexing Using Adaptive Sprinkling , 2007, IJCAI.

[36]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[37]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[38]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[39]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[40]  Ian H. Witten,et al.  A knowledge-based search engine powered by wikipedia , 2007, CIKM '07.

[41]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[42]  John Langford,et al.  Predictive Indexing for Fast Search , 2008, NIPS.

[43]  David Grangier,et al.  A Discriminative Kernel-based Approach to Retrieval Images from Text Queries , 2008 .

[44]  Hua Li,et al.  Enhancing text clustering by leveraging Wikipedia semantics , 2008, SIGIR '08.

[45]  Oren Kurland,et al.  Query-drift prevention for robust query expansion , 2008, SIGIR '08.

[46]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[47]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[48]  John Langford,et al.  Hash Kernels , 2009, AISTATS.

[49]  Jason Weston,et al.  Supervised Semantic Indexing , 2009, ECIR.

[50]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..