Distilling Task Knowledge from How-To Communities

Knowledge graphs have become a fundamental asset for search engines. A fair amount of user queries seek information on problem-solving tasks such as building a fence or repairing a bicycle. However, knowledge graphs completely lack this kind of how-to knowledge. This paper presents a method for automatically constructing a formal knowledge base on tasks and task-solving steps, by tapping the contents of online communities such as WikiHow. We employ Open-IE techniques to extract noisy candidates for tasks, steps and the required tools and other items. For cleaning and properly organizing this data, we devise embedding-based clustering techniques. The resulting knowledge base, HowToKB, includes a hierarchical taxonomy of disambiguated tasks, temporal orders of sub-tasks, and attributes for involved items. A comprehensive evaluation of HowToKB shows high accuracy. As an extrinsic use case, we evaluate automatically searching related YouTube videos for HowToKB tasks.

[1]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Event Chains , 2008, ACL.

[2]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[3]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[4]  Luke S. Zettlemoyer,et al.  Joint A* CCG Parsing and Semantic Role Labelling , 2015, EMNLP.

[5]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Eric Nyberg,et al.  Leveraging Procedural Knowledge for Task-oriented Search , 2015, SIGIR.

[8]  Gerhard Weikum,et al.  POLY: Mining Relational Paraphrases from Multilingual Sentences , 2016, EMNLP.

[9]  Ivan Titov,et al.  A Bayesian Approach to Unsupervised Semantic Role Induction , 2012, EACL.

[10]  Oren Etzioni,et al.  Generating Coherent Event Schemas at Scale , 2013, EMNLP.

[11]  Catherine Havasi,et al.  Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.

[12]  References , 1971 .

[13]  Richard Johansson,et al.  LTH: Semantic Structure Extraction using Nonprojective Dependency Trees , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[14]  Mirella Lapata,et al.  Neural Semantic Role Labeling with Dependency Path Embeddings , 2016, ACL.

[15]  Peter Jansen,et al.  Creating Causal Embeddings for Question Answering with Minimal Supervision , 2016, EMNLP.

[16]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[17]  Manfred Pinkal,et al.  Learning Script Knowledge with Web Experiments , 2010, ACL.

[18]  Peter Clark,et al.  Modeling Biological Processes for Reading Comprehension , 2014, EMNLP.

[19]  Dan I. Moldovan,et al.  Text Mining for Causal Relations , 2002, FLAIRS.

[20]  Martha Palmer,et al.  Can Semantic Roles Generalize Across Genres? , 2007, NAACL.

[21]  Raymond Mooney,et al.  Statistical Script Learning with Recurrent Neural Networks , 2016 .

[22]  Gerhard Weikum,et al.  Knowlywood: Mining Activity Knowledge From Hollywood Narratives , 2015, CIKM.

[23]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[24]  Cícero Nogueira dos Santos,et al.  Semantic Role Labeling , 2012 .

[25]  Mandar Mitra,et al.  Improving query expansion using WordNet , 2013, J. Assoc. Inf. Sci. Technol..

[26]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[27]  Gerhard Weikum,et al.  Acquiring Comparative Commonsense Knowledge from the Web , 2014, AAAI.

[28]  Luke S. Zettlemoyer,et al.  Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language , 2015, EMNLP.

[29]  Francis Ferraro,et al.  Script Induction as Language Modeling , 2015, EMNLP.

[30]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[31]  Ivan Titov,et al.  A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge , 2014, EACL.

[32]  Thomas G. Dietterich,et al.  Learning Scripts as Hidden Markov Models , 2014, AAAI.

[33]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[34]  Nancy Chinchor,et al.  MUC-4 evaluation metrics , 1992, MUC.

[35]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[36]  Nathanael Chambers,et al.  Event Schema Induction with a Probabilistic Entity-Driven Model , 2013, EMNLP.

[37]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[38]  Roberto Basili,et al.  Automatic induction of FrameNet lexical units , 2008, EMNLP.

[39]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[40]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[41]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[42]  Yejin Choi,et al.  Mise en Place: Unsupervised Interpretation of Instructional Recipes , 2015, EMNLP.

[43]  Martha Palmer,et al.  PropBank: the Next Level of TreeBank , 2003 .

[44]  Stefan Thater,et al.  A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge , 2016, LREC.

[45]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[46]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[47]  Noah A. Smith,et al.  Frame-Semantic Parsing , 2014, CL.

[48]  Hwee Tou Ng,et al.  Domain adaptation for semantic role labeling in the biomedical domain , 2010, Bioinform..

[49]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[50]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[51]  James Henderson,et al.  A Bayesian Model of Multilingual Unsupervised Semantic Role Induction , 2016, ArXiv.

[52]  Peter Clark,et al.  Cross Sentence Inference for Process Knowledge , 2016, EMNLP.

[53]  Roberto Navigli,et al.  Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis , 2015, TACL.

[54]  Christopher D. Manning,et al.  Joint Learning Improves Semantic Role Labeling , 2005, ACL.

[55]  Francis Ferraro,et al.  Visual Storytelling , 2016, NAACL.

[56]  Mirella Lapata,et al.  Unsupervised Semantic Role Induction with Graph Partitioning , 2011, EMNLP.