A Discriminative Model for Query Spelling Correction with Latent Structural SVM

Discriminative training in query spelling correction is difficult due to the complex internal structures of the data. Recent work on query spelling correction suggests a two stage approach a noisy channel model that is used to retrieve a number of candidate corrections, followed by discriminatively trained ranker applied to these candidates. The ranker, however, suffers from the fact the low recall of the first, suboptimal, search stage. This paper proposes to directly optimize the search stage with a discriminative model based on latent structural SVM. In this model, we treat query spelling correction as a multiclass classification problem with structured input and output. The latent structural information is used to model the alignment of words in the spelling correction process. Experiment results show that as a standalone speller, our model outperforms all the baseline systems. It also attains a higher recall compared with the noisy channel model, and can therefore serve as a better filtering stage when combined with a ranker.

[1]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[2]  James Allan,et al.  Effective and efficient user interaction for long queries , 2008, SIGIR '08.

[3]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Hang Li,et al.  A unified and discriminative model for query refinement , 2008, SIGIR '08.

[6]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[7]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[8]  Shahram Khadivi,et al.  A Sequence Alignment Model Based on the Averaged Perceptron , 2007, EMNLP.

[9]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[10]  Claire Cardie,et al.  Multi-Level Structured Models for Document-Level Sentiment Classification , 2010, EMNLP.

[11]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[12]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[13]  ChengXiang Zhai,et al.  Mining term association patterns from search logs for effective query reformulation , 2008, CIKM '08.

[14]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[15]  Noah A. Smith,et al.  Vine Parsing and Minimum Risk Reranking for Speed and Precision , 2006, CoNLL.

[16]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[17]  Farooq Ahmad,et al.  Learning a Spelling Error Model from Search Query Logs , 2005, HLT.

[18]  Xu Sun,et al.  A Large Scale Ranker-Based System for Search Query Spelling Correction , 2010, COLING.

[19]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[20]  Alon Lavie,et al.  Unsupervised Word Alignment with Arbitrary Features , 2011, ACL.

[21]  Xu Sun,et al.  Learning Phrase-Based Spelling Error Models from Clickthrough Data , 2010, ACL.

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[24]  Ming-Wei Chang,et al.  Discriminative Learning over Constrained Latent Representations , 2010, NAACL.

[25]  Huizhong Duan,et al.  Online spelling correction for query completion , 2011, WWW.

[26]  Ming Zhou,et al.  Improving Query Spelling Correction Using Web Search Results , 2007, EMNLP-CoNLL.

[27]  Zhendong Niu,et al.  Concept Based Query Expansion , 2013, 2013 Ninth International Conference on Semantics, Knowledge and Grids.

[28]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.