Comparing Named Entity Recognition on Transcriptions and Written Texts

The ability to recognize named entities (e.g., person, location and organization names) in texts has been proved as an important task for several natural language processing areas, including Information Retrieval and Information Extraction. However, despite the efforts and the achievements obtained in Named Entity Recognition from written texts, the problem of recognizing named entities from automatic transcriptions of spoken documents is still far from being solved. In fact, the output of Automatic Speech Recognition (ASR) often contains transcription errors; in addition, many named entities are out-of-vocabulary words, which makes them not available to the ASR. This paper presents a comparative analysis of extracting named entities both from written texts and from transcriptions. As for transcriptions, we have used spoken broadcast news, while for written texts we have used both newspapers of the same domain of the transcriptions and the manual transcriptions of the broadcast news. The comparison was carried on a number of experiments using the best Named Entity Recognition system presented at Evalita 2007.

[1]  Olivier Galibert,et al.  Structured and Extended Named Entity Evaluation in Automatic Speech Transcriptions , 2011, IJCNLP.

[2]  Frédéric Béchet,et al.  Robust Named Entity Extraction from Large Spoken Archives , 2005, HLT/EMNLP.

[3]  Mari Ostendorf,et al.  INFORMATION EXTRACTION FROM BROADCAST NEWS SPEECH DATA , 1999 .

[4]  Sam Coates-Stephens,et al.  The Analysis and Acquisition of Proper Names for the Understanding of Free Text , 1992, Comput. Humanit..

[5]  Valentina Bartalesi Lenzi,et al.  EVALITA 2011: Description and Results of the Named Entity Recognition on Transcribed Broadcast News Task , 2011 .

[6]  Cheng Niu,et al.  A case restoration approach to named entity tagging in degraded documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  Marcello Federico,et al.  Spoken Information Extraction from Italian Broadcast News , 2003, ECIR.

[8]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[9]  James Allan,et al.  Using Soundex Codes for Indexing Names in ASR Documents , 2004, HLT-NAACL 2004.

[10]  Emanuele Pianta,et al.  I-CAB: the Italian Content Annotation Bank , 2006, LREC.

[11]  Ralph Weischedel,et al.  NAMED ENTITY EXTRACTION FROM SPEECH , 1998 .

[12]  Fabio Rinaldi,et al.  FACILE: Description of the NE System Used for MUC-7 , 1998, MUC.

[13]  Md. Faisal Mahbub Chowdhury A Simple Yet Effective Approach for Named Entity Recognition from Transcribed Broadcast News , 2011, EVALITA.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Hermann Ney,et al.  Automatic Transcription of Courtroom Recordings in the JUMAS project , 2009, ICT4Justice.

[16]  Manuela Speranza The Named Entity Recognition Task at EVALITA 2009 , 2009 .

[17]  Lynette Hirschman,et al.  Overview: Information Extraction From Broadcast News , 1999 .

[18]  Valentina Bartalesi Lenzi,et al.  Named Entity Recognition on Transcribed Broadcast News at EVALITA 2011 , 2011, EVALITA.

[19]  Paolo Rosso,et al.  Overview of QAST 2009 , 2009, CLEF.

[20]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[21]  John S. Garofolo,et al.  1998 HUB-4 INFORMATION EXTRACTION EVALUATION , 1999 .

[22]  Trento,et al.  Named Entity Recognition on Transcribed Broadcast News Guidelines for Participants , 2011 .

[23]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[24]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[25]  Alessandro Moschitti,et al.  Structural reranking models for named entity recognition , 2012, Intelligenza Artificiale.

[26]  Emanuele Pianta,et al.  The TextPro Tool Suite , 2008, LREC.

[27]  Roberto Zanoli,et al.  Named Entity Recognition through Redundancy Driven Classifiers , 2009 .