Learning Weighted Entity Lists from Web Click Logs for Spoken Language Understanding

Named entity lists provide important features for language understanding, but typical lists can contain many ambiguous or incorrect phrases. We present an approach for automatically learning weighted entity lists by mining user clicks from web search logs. The approach significantly outperforms multiple baseline approaches and the weighted lists improve spoken language understanding tasks such as domain detection and slot filling. Our methods are general and can be easily applied to large quantities of entities, across any number of lists. Index Terms: spoken language understanding, domain detection, slot filling, named entity lists, click logs

[1]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[2]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[3]  Caroline Sporleder,et al.  Bootstrapping Information Extraction from Field Books , 2007, EMNLP.

[4]  Xiao Li,et al.  Lexicon modeling for query understanding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Dilek Z. Hakkani-Tür,et al.  Detecting and extracting named entities from spontaneous speech in a mixed-initiative spoken dialogue context: How May I Help You?sm, tm , 2004, Speech Commun..

[6]  Xiao Li,et al.  Extracting structured information from user queries with semi-supervised conditional random fields , 2009, SIGIR.

[7]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[8]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[9]  Xiao Li,et al.  Understanding the Semantic Structure of Noun Phrase Queries , 2010, ACL.

[10]  Tasos Anastasakos,et al.  A collaborative filtering approach to ad recommendation using the query-ad click graph , 2009, CIKM.

[11]  Xiao Li,et al.  Semi-supervised learning of semantic classes for query understanding: from the web and for the web , 2009, CIKM.

[12]  Benjamin Van Durme,et al.  Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs , 2008, ACL.

[13]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[14]  Eugene Agichtein,et al.  Mining reference tables for automatic text segmentation , 2004, KDD.

[15]  Sreenivas Gollapudi,et al.  Result enrichment in commerce search using browse trails , 2011, WSDM '11.