Acquiring knowledge about human goals from Search Query Logs

A better understanding of what motivates humans to perform certain actions is relevant for a range of research challenges including generating action sequences that implement goals (planning). A first step in this direction is the task of acquiring knowledge about human goals. In this work, we investigate whether Search Query Logs are a viable source for extracting expressions of human goals. For this purpose, we devise an algorithm that automatically identifies queries containing explicit goals such as find home to rent in Florida. Evaluation results of our algorithm achieve useful precision/recall values. We apply the classification algorithm to two large Search Query Logs, recorded by AOL and Microsoft Research in 2006, and obtain a set of ~110,000 queries containing explicit goals. To study the nature of human goals in Search Query Logs, we conduct qualitative, quantitative and comparative analyses. Our findings suggest that Search Query Logs (i) represent a viable source for extracting human goals, (ii) contain a great variety of human goals and (iii) contain human goals that can be employed to complement existing commonsense knowledge bases. Finally, we illustrate the potential of goal knowledge for addressing following application scenario: to refine and extend commonsense knowledge with human goals from Search Query Logs. This work is relevant for (i) knowledge engineers interested in acquiring human goals from textual corpora and constructing knowledge bases of human goals (ii) researchers interested in studying characteristics of human goals in Search Query Logs.

[1]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[2]  Ian Scott Eslick,et al.  Searching for commonsense , 2006 .

[3]  Bernard J. Jansen,et al.  Evaluating the effectiveness of and patterns of interactions with automated searching assistance: Research Articles , 2005 .

[4]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[5]  J. Avery,et al.  The long tail. , 1995, Journal of the Tennessee Medical Association.

[6]  Henry Lieberman,et al.  A goal-oriented web browser , 2006, CHI.

[7]  Marta Tatu,et al.  Automatic Discovery of Intentions in Text and its Application to Question Answering , 2005, ACL.

[8]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[9]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[10]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[11]  D. Krathwohl A Taxonomy for Learning, Teaching and Assessing: , 2008 .

[12]  Benjamin Van Durme,et al.  The role of documents vs. queries in extracting class attributes from text , 2007, CIKM '07.

[13]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[14]  David Maxwell Chickering,et al.  Intentions: a game for classifying search query intent , 2009, CHI Extended Abstracts.

[15]  Bernard J. Jansen,et al.  Evaluating the effectiveness of and patterns of interactions with automated searching assistance , 2005, J. Assoc. Inf. Sci. Technol..

[16]  Ben Gerson The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture , 2005 .

[17]  Ying Li,et al.  KDD CUP-2005 report: facing a great challenge , 2005, SKDD.

[18]  Ophir Frieder,et al.  Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[19]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[20]  Doug Downey,et al.  Understanding the relationship between searchers' queries and information goals , 2008, CIKM '08.

[21]  Eugene Agichtein,et al.  Ready to buy or just browsing?: detecting web searcher goals from interaction data , 2010, SIGIR.

[22]  Ying Li,et al.  Detecting online commercial intention (OCI) , 2006, WWW '06.

[23]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[24]  James Allan,et al.  Using part-of-speech patterns to reduce query ambiguity , 2002, SIGIR '02.

[25]  Henry Lieberman,et al.  The why UI: using goal networks to improve user interfaces , 2010, IUI '10.

[26]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[27]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[28]  Ian Witten,et al.  Data Mining , 2000 .

[29]  Mathias Lux,et al.  How Do Users Express Goals on the Web? - An Exploration of Intentional Structures in Web Search , 2007, WISE Workshops.

[30]  Bernard J. Jansen,et al.  Classifying web queries by topic and user intent , 2010, CHI Extended Abstracts.

[31]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[32]  Markus Strohmaier,et al.  Intentional query suggestion: making user goals more explicit during search , 2009, WSCD '09.

[33]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[34]  Yijun Yu,et al.  On Goal-based Variability Acquisition and Analysis , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[35]  Moritz Tenorth,et al.  Understanding and executing instructions for everyday manipulation tasks from the World Wide Web , 2010, 2010 IEEE International Conference on Robotics and Automation.

[36]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[37]  Gang Wang,et al.  Understanding user's query intent with wikipedia , 2009, WWW '09.

[38]  David Kirsh,et al.  When is Information Explicitly Represented , 1990 .

[39]  Kenji Takahashi,et al.  Inquiry-based requirements analysis , 1994, IEEE Software.

[40]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[41]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[42]  Jaime Teevan,et al.  Query log analysis: social and technological challenges , 2007, SIGF.

[43]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[44]  Hema Raghavan,et al.  Discovering users' specific geo intention in web search , 2009, WWW '09.

[45]  Henry Lieberman,et al.  Beating Common Sense into Interactive Applications , 2004, AI Mag..

[46]  Alain Wegmann,et al.  Where do goals come from: the underlying principles of goal-oriented requirements engineering , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[47]  Ricardo A. Baeza-Yates,et al.  The Intention Behind Web Queries , 2006, SPIRE.

[48]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[49]  Rebecca J. Passonneau,et al.  Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues , 1993, ACL.

[50]  Henry Lieberman Usable AI Requires Commonsense Knowledge , 2008 .

[51]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[52]  Oren Etzioni,et al.  Strategies for lifelong knowledge extraction from the web , 2007, K-CAP '07.

[53]  Markus Strohmaier,et al.  Studying databases of intentions: do search query logs capture knowledge about common human goals? , 2009, K-CAP '09.

[54]  Catherine Havasi,et al.  ConceptNet 3 : a Flexible , Multilingual Semantic Network for Common Sense Knowledge , 2007 .

[55]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[56]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[57]  Eric S. K. Yu,et al.  Extracting Conceptual Relationships from Specialized Documents , 2002, ER.

[58]  Fernando Diaz,et al.  Temporal profiles of queries , 2007, TOIS.

[59]  H. Lieberman Common Consensus : a web-based game for collecting commonsense goals , 2007 .

[60]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[61]  Dan I. Moldovan,et al.  Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction , 1995, IEEE Trans. Knowl. Data Eng..

[62]  Kuan-Yu He,et al.  Improving Identification of Latent User Goals through Search-Result Snippet Classification , 2007 .

[63]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[64]  Mirella Lapata,et al.  Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05) , 2005, ACL 2005.

[65]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[66]  Henry Lieberman,et al.  GOOSE: A Goal-Oriented Search Engine with Commonsense , 2002, AH.

[67]  Markus Strohmaier,et al.  Analyzing human intentions in natural language text , 2009, K-CAP '09.

[68]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[69]  Bernard J. Jansen,et al.  Using the taxonomy of cognitive learning to model online searching , 2009, Inf. Process. Manag..

[70]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.

[71]  Ying Zhang,et al.  To what degree can log data profile a web searcher? , 2009, ASIST.

[72]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .

[73]  Markus Strohmaier,et al.  iTAG: Automatically Annotating Textual Resources with Human Intentions , 2010 .

[74]  R. Yin Case Study Research: Design and Methods , 1984 .

[75]  Patrick Pantel,et al.  Automatically Harvesting and Ontologizing Semantic Relations , 2008, Ontology Learning and Population.

[76]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[77]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[78]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[79]  Markus Strohmaier,et al.  Acquiring Explicit User Goals from Search Query Logs , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[80]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[81]  Amanda Spink,et al.  Determining the informational, navigational, and transactional intent of Web queries , 2008, Inf. Process. Manag..

[82]  S. Read,et al.  A Hierarchical Taxonomy of Human Goals , 2001 .

[83]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[84]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[85]  Susan T. Dumais,et al.  To personalize or not to personalize: modeling queries with variation in user intent , 2008, SIGIR '08.

[86]  Dustin Smith,et al.  EventMinder: A Personal Calendar Assistant That Understands Events , 2007 .