Exploiting query click logs for utterance domain detection in spoken language understanding

In this paper, we describe methods to exploit search queries mined from search engine query logs to improve domain detection in spoken language understanding. We propose extending the label propagation algorithm, a graph-based semi-supervised learning approach, to incorporate noisy domain information estimated from search engine links the users click following their queries. The main contributions of our work are the use of search query logs for domain classification, integration of noisy supervision into the semi-supervised label propagation algorithm, and sampling of high-quality query click data by mining query logs and using classification confidence scores. We show that most semi-supervised learning methods we experimented with improve the performance of the supervised training, and the biggest improvement is achieved by label propagation that uses noisy supervision. We reduce the to error rate of domain detection by 20% relative, from 6.2% to 5.0%.

[1]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[2]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[3]  Dilek Z. Hakkani-Tür,et al.  Spoken language understanding , 2008, IEEE Signal Processing Magazine.

[4]  Barbara Di Eugenio,et al.  FLSA: Extending Latent Semantic Analysis with Features for Dialogue Act Classification , 2004, ACL.

[5]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[6]  Dilek Z. Hakkani-Tür,et al.  MODEL ADAPTATION FOR DIALOG ACT TAGGING , 2006, 2006 IEEE Spoken Language Technology Workshop.

[7]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[8]  Ryen W. White,et al.  Sampling high-quality clicks from noisy click data , 2010, WWW '10.

[9]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[10]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[12]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[13]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[14]  Ahmed Hassan Awadallah,et al.  Beyond DCG: user behavior as a predictor of a successful search , 2010, WSDM '10.