A Framework for Compiling High Quality Knowledge Resources From Raw Corpora

The identification of various types of relations is a necessary step to allow computers to understand natural language text. In particular, the clarification of relations between predicates and their arguments is essential because predicate-argument structures convey most of the information in natural languages. To precisely capture these relations, wide-coverage knowledge resources are indispensable. Such knowledge resources can be derived from automatic parses of raw corpora, but unfortunately parsing still has not achieved a high enough performance for precise knowledge acquisition. We present a framework for compiling high quality knowledge resources from raw corpora. Our proposed framework selects high quality dependency relations from automatic parses and makes use of them for not only the calculation of fundamental distributional similarity but also the acquisition of knowledge such as case frames.

[1]  Daisuke Kawahara,et al.  A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis , 2006, HLT-NAACL.

[2]  Daisuke Kawahara,et al.  Cascaded Classification for High Quality Head-modifier Pair Selection , 2008 .

[3]  Jong-Hoon Oh,et al.  Relation Acquisition using Word Classes and Partial Patterns , 2011, EMNLP.

[4]  Martha Palmer,et al.  PropBank: the Next Level of TreeBank , 2003 .

[5]  Daisuke Kawahara,et al.  Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation , 2010, LREC.

[6]  Anna Korhonen,et al.  Improved Lexical Acquisition through DPP-based Verb Clustering , 2013, ACL.

[7]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[8]  Yuji Matsumoto,et al.  Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks , 2013, EMNLP.

[9]  Kentaro Torisawa,et al.  Improving Dependency Parsing with Subtrees from Auto-Parsed Data , 2009, EMNLP.

[10]  Kentaro Torisawa,et al.  Extracting Paraphrases from Definition Sentences on the Web , 2011, ACL.

[11]  Daisuke Kawahara,et al.  High Quality Dependency Selection from Automatic Parses , 2013, IJCNLP.

[12]  Hitoshi Isahara,et al.  An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging , 2009, ACL/IJCNLP.

[13]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[14]  Hans Christian Boas,et al.  Bilingual FrameNet Dictionaries for Machine Translation , 2002, LREC.

[15]  Ted Briscoe,et al.  A Large Subcategorization Lexicon for Natural Language Processing Applications , 2006, LREC.