Relational feature engineering of natural language processing

We present a new framework for feature engineering of natural language processing that is based on a relational data model of text. It includes fast and flexible methods for implementing and extracting new features and thereby reduces the effort of creating an NLP system for a particular task. In an instantiation and evaluation of the framework for the problem of coreference resolution in multiple languages, we were able to obtain competitive results in a short implementation period. This demonstrates the potential power of our framework for feature engineering.

[1]  Michael Stonebraker,et al.  Extended User-Defined Indexing with Application to Textual Databases , 1988, VLDB.

[2]  Ophir Frieder,et al.  Integrating structured data and text: a relational approach , 1997 .

[3]  Carolyn E. Begg,et al.  Database Systems: A Practical Approach to Design, Implementation and Management , 1998 .

[4]  Vania Bogorny,et al.  Weka-GDPM – Integrating Classical Data Mining Toolkit to Geographic Information Systems , 2006 .

[5]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[6]  David A. Grossman,et al.  Using the Relational Model and Part-of-Speech Tagging to Implement Text Relevance , 1992 .

[7]  Xiaoqiang Luo,et al.  On Coreference Resolution Performance Metrics , 2005, HLT.

[8]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[9]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[10]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[13]  Hinrich Schütze,et al.  SUCRE: A Modular System for Coreference Resolution , 2010, *SEMEVAL.

[14]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[15]  Carolyn Begg Thomas Connolly,et al.  Database Systems: A Practical Approach To Design, , 2004 .