Pattern Mining for Named Entity Recognition

Many evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our research team has developed CasEN, a symbolic system based on finite state transducers, which achieved promising results during the Ester2 French-speaking evaluation campaign. Despite these encouraging results, manually extending the coverage of such a hand-crafted system is a difficult task. In this paper, we present a novel approach based on pattern mining for NER and to supplement our system’s knowledge base. The system, mXS, exhaustively searches for hierarchical sequential patterns, that aim at detecting Named Entity boundaries. We assess their efficiency by using such patterns in a standalone mode and in combination with our existing system.

[1]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[2]  David D. McDonald Internal and External Evidence in the Identification and Semantic Categorization of Proper Names , 1993 .

[3]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[4]  Jean-Yves Antoine,et al.  Recognizing Named Entities using Automatically Extracted Transduction Rules , 2011, LTC 2011.

[5]  Damien Nouvel,et al.  An Analysis of the Performances of the CasEN Named Entities Recognition System in the Ester2 Evaluation Campaign , 2010, LREC.

[6]  Olivier Galibert,et al.  Structured and Extended Named Entity Evaluation in Automatic Speech Transcriptions , 2011, IJCNLP.

[7]  Denis Maurel,et al.  Finite-state transducer cascades to extract named entities in texts , 2004, Theor. Comput. Sci..

[8]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[9]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[10]  Denis Maurel,et al.  Prolexbase et LMF: vers un standard pour les ressources lexicales sur les noms propres , 2008 .

[11]  Elaine Marsh,et al.  MUC-7 Evaluation of IE Technology: Overview of Results , 1998, MUC.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  James Pustejovsky,et al.  Corpus processing for lexical acquisition , 1996 .

[14]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[15]  Damien Nouvel,et al.  Coupling Knowledge-Based and Data-Driven Systems for Named Entity Recognition , 2012 .

[16]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[17]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[18]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[19]  Damien Nouvel,et al.  Reconnaissance des entités nommées par exploration de règles d'annotation - Interpréter les marqueurs d'annotation comme instructions de structuration locale. (Named Entity Recognition by Mining Association Rules) , 2012 .

[20]  Sadaoki Furui,et al.  International Speech Communication Association , 2006 .

[21]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[22]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.