Toward the automatic extraction of knowledge of usable goods

Knowledge of usable goods (e.g., toothbrush is used to clean the teeth and treadmill is used for exercise) is ubiquitous and in constant demand. This study proposes semantic labels to capture aspects of knowledge of usable goods and builds a benchmark corpus, Usable Goods Corpus, to explore this new semantic labeling task. Our human annotation experiment shows that human annotators can generally identify pieces of information of usable goods in text. Our first attempt toward the automatic identification of such knowledge shows that a model using conditional random fields approaches the human annotation (F score 73.2%). These results together suggest future directions to build a large-scale corpus and improve the automatic identification of knowledge of usable goods.

[1]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[2]  Catherine Havasi,et al.  Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.

[3]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  BaldwinTimothy,et al.  Automatic Acquisition of Qualia Structure from Corpus Data , 2007 .

[6]  Philipp Cimiano,et al.  Automatic Acquisition of Ranked Qualia Structures from the Web , 2007, ACL.

[7]  James Pustejovsky,et al.  Lexical Semantic Techniques for Corpus Analysis , 1993, CL.

[8]  Juliane Fluck,et al.  Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[9]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[10]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[11]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[12]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[13]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[14]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[15]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[16]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[17]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[18]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.