Robust tree-structured Named Entities Recognition from speech

Named Entity Recognition is a well-known Natural Language Processing (NLP) task, used as a preliminary processing to provide a semantic level to more complex tasks. Recently a new set of named entities has been defined; this set has a multilevel tree structure, where base entities are combined to define more complex ones. In this paper I describe an effective and original NER system robust to noisy speech inputs that ranked first at the 2012 ETAPE NER evaluation campaign with results far better than those of the other participating systems.

[1]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Olivier Galibert,et al.  Named and Specific Entity Detection in Varied Data: The Quæro Named Entity Baseline Evaluation , 2010, LREC.

[3]  Rohini K. Srihari,et al.  Named Entity Recognition for Improving Retrieval and Translation of Chinese Documents , 2008, ICADL.

[4]  John Makhoul Information Extraction from speech , 2006, SLT.

[5]  Valentin Jijkoun,et al.  The Impact of Named Entity Normalization on Information Retrieval for Question Answering , 2008, ECIR.

[6]  C. Raymond,et al.  Reconnaissance robuste d’entités nommées sur de la parole transcrite automatiquement , 2010, JEPTALNRECITAL.

[7]  Bogdan Babych,et al.  Improving Machine Translation Quality with Automatic Named Entity Recognition , 2003, Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools Resources and Tools for Building MT - EAMT '03.

[8]  Sophie Rosset,et al.  Models Cascade for Tree-Structured Named Entity Detection , 2011, IJCNLP.

[9]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[10]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[11]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[12]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[13]  Olivier Galibert,et al.  Proposal for an Extension of Traditional Named Entities: From Guidelines to Evaluation, an Overview , 2011, Linguistic Annotation Workshop.

[14]  Olivier Galibert,et al.  The ETAPE corpus for the evaluation of speech-based TV content processing in the French language , 2012, LREC.

[15]  Martin Hassel Exploitation of Named Entities in Automatic Text Summarization for Swedish , 2003 .

[16]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.