Subtask Mining from Search Query Logs for How-Knowledge Acceleration

How-knowledge is indispensable in daily life, but has relatively less quantity and poorer quality than what-knowledge in publicly available knowledge bases. This paper first extracts task-subtask pairs from wikiHow, then mines linguistic patterns from search query logs, and finally applies the mined patterns to extract subtasks to complete given how-to tasks. To evaluate the proposed methodology, we group tasks and the corresponding recommended subtasks into pairs, and evaluate the results automatically and manually. The automatic evaluation shows the accuracy of 0.4494. We also classify the mined patterns based on prepositions and find that the prepositions like “on”, “to”, and “with” have the better performance. The results can be used to accelerate how-knowledge base construction.

[1]  Wei Chu,et al.  Learning to extract cross-session search tasks , 2013, WWW.

[2]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[3]  Sung-Hyon Myaeng,et al.  Automatic construction of a large-scale situation ontology by mining how-to instructions from the web , 2010, J. Web Semant..

[4]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[5]  Matthias Hagen,et al.  From search session detection to search mission detection , 2013, OAIR.

[6]  Ryen W. White,et al.  Modeling and analysis of cross-session search tasks , 2011, SIGIR.

[7]  Ralph Bergmann,et al.  Extraction of procedural knowledge from the web: a comparison of two workflow extraction approaches , 2012, WWW.

[8]  Steven Diamond,et al.  TaskGenies: Automatically Providing Action Plans Helps People Complete Tasks , 2012, TCHI.

[9]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[10]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[11]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[12]  Haixun Wang,et al.  Identifying users' topical tasks in web search , 2013, WSDM.

[13]  Daniel Borrajo,et al.  From Unstructured Web Knowledge to Plan Descriptions , 2011, Information Retrieval and Mining in Distributed Environments.

[14]  Alessandro Soro,et al.  Information Retrieval and Mining in Distributed Environments , 2011 .