Role-explicit query identification and intent role annotation

Understanding the information need or intent encoded within a query has long been regarded as an essential factor of effective information retrieval. For better query representation and understanding, two intent roles (kernel-object and modifier) are introduced to structurally parse a class of role-explicit queries, which constitute a majority of common user queries. Furthermore, we focus on two research problems: RP-1: Given a role-explicit query, how to identify the kernel-object and modifier, namely intent role annotation; RP-2: How to determine whether an arbitrary query is role-explicit or not. To solve RP-1, we propose a simplified word n-gram role model (SWNR), which quantifies the generating probability of a role-explicit query and performs intent role annotation effectively. Using a set of discriminative features, we build classifiers to address RP-2 in a supervised manner. The experimental results show that: (1) SWNR can achieve a satisfactory performance, more than 73% in terms of different metrics; (2) The classifiers can achieve more than 90% precision in identifying role-explicit queries; (3) Compared with traditional techniques for query representation and understanding, e.g., name entity recognition in query and class-level query intent inference, intent role annotation provides a more flexible framework and a number of applications can benefit from annotating role-explicit queries, such as intent mining and diversified document ranking.

[1]  Deepayan Chakrabarti,et al.  Mining broad latent query aspects from search sessions , 2009, KDD.

[2]  Chorkin Chan,et al.  Chinese Word Segmentation based on Maximum Matching and Word Binding Force , 1996, COLING.

[3]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[4]  Zhimin Zhang,et al.  Using search session context for named entity recognition in query , 2010, SIGIR.

[5]  Song Liu,et al.  Qualifier Mining for NTCIR-INTENT , 2011, NTCIR.

[6]  Fuji Ren,et al.  From Cloud Computing to Language Engineering, Affective Computing and Advanced Intelligence ∗ , 2010 .

[7]  Hermann Ney,et al.  Statistical Language Modeling and Word Triggers , 1996 .

[8]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[9]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[10]  Marius Pasca,et al.  Weakly-supervised discovery of named entities using web search queries , 2007, CIKM '07.

[11]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[12]  Yang Song,et al.  Optimal rare query suggestion with implicit user feedback , 2010, WWW '10.

[13]  Fuchun Peng,et al.  Unsupervised query segmentation using generative language models and wikipedia , 2008, WWW.

[14]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[15]  Benjamin Piwowarski,et al.  Predictive user click models based on click-through history , 2007, CIKM '07.

[16]  Yiqun Liu,et al.  Overview of the NTCIR-9 INTENT Task , 2011, NTCIR.

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[19]  Wayne H. Ward,et al.  Towards Robust Semantic Role Labeling , 2007, CL.

[20]  Azadeh Shakery,et al.  Beyond hyperlinks: organizing information footprints in search logs to support effective browsing , 2009, CIKM.

[21]  Mária Bieliková,et al.  Personalized Faceted Navigation in Semantically Enriched Information Spaces , 2008 .

[22]  Steve Renals,et al.  Text- and Speech-Triggered Information Access , 2003, Lecture Notes in Computer Science.

[23]  Fuji Ren,et al.  EFFECT OF USING PRAGMATICS INFORMATION ON QUESTION ANSWERING SYSTEM OF ANALECTS OF CONFUCIUS , 2009 .

[24]  Hema Raghavan,et al.  Discovering users' specific geo intention in web search , 2009, WWW '09.

[25]  Xiaohui Yu,et al.  Query segmentation using conditional random fields , 2009, KEYS '09.

[26]  Fuji Ren,et al.  Advanced Information Retrieval , 2006, MFCSIT.

[27]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[28]  Xiaoxin Yin,et al.  Building taxonomy of web search intents for name entity queries , 2010, WWW '10.

[29]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[30]  Junghoo Cho,et al.  Automatically identifying localizable queries , 2008, SIGIR '08.

[31]  Marius Pasca,et al.  Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.

[32]  Rosie Jones,et al.  The Linguistic Structure of English Web-Search Queries , 2008, EMNLP.

[33]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[34]  Matthias Hagen,et al.  Query segmentation revisited , 2011, WWW.

[35]  Satoshi Sekine,et al.  Extended Named Entity Ontology with Attribute Information , 2008, LREC.

[36]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[37]  Ryen W. White,et al.  Mining Historic Query Trails to Label Long and Rare Search Engine Queries , 2010, TWEB.