论文信息 - Proper Name Extraction from Non-Journalistic Texts

Proper Name Extraction from Non-Journalistic Texts

This paper discusses the influence of the corpus on the automatic identification of proper names in texts. Techniques developed for the newswire genre are generally not sufficient to deal with larger corpora containing texts that do not follow strict writing constraints (for example, e-mail messages, transcriptions of oral conversations, etc). After a brief review of the research performed on news texts, we present some of the problems involved in the analysis of two different corpora: e-mails and hand-transcribed telephone conversations. Once the sources of errors have been presented, we then describe an approach to adapt a proper name extraction system developed for newspaper texts to the analysis of e-mail

Thierry Poibeau | Leila Kosseim | T. Poibeau | Leila Kosseim

[1] Guy Lapalme,et al. exibum : Un systeme experimental d'extraction d'information bilingue , 1998 .

[2] Christian Jacquemin,et al. EMPIRICAL OBSERVATION OF TERM VARIATIONS AND PRINCIPLES FOR THEIR DESCRIPTION , 1996 .

[3] Denise E. Murray,et al. The context of oral and written language: A framework for mode and medium switching , 1988, Language in Society.

[4] Simeon Yates,et al. Oral and written linguistic aspects of computer conferencing : A corpus based study , 1996 .

[5] Gökhan Tür,et al. Combining words and prosody for information extraction from speech , 1999, EUROSPEECH.

[6] Ralph Weischedel,et al. Named Entity Extraction from Broadcast News , 1999 .

[7] Douglas E. Appelt,et al. FASTUS: A Finite-state Processor for Information Extraction from Real-world Text , 1993, IJCAI.

[8] Jean Senellart. Locating Noun Phrases with Finite State Transducers , 1998, COLING-ACL.

[9] Ralph Weischedel,et al. NAMED ENTITY EXTRACTION FROM SPEECH , 1998 .

[10] Dekang Lin. Using Collocation Statistics in Information Extraction , 1998, MUC.

[11] Douglas E. Appelt,et al. SRI International FASTUS SystemMUC-6 Test Results and Analysis , 1995, MUC.

[12] M. Collot,et al. Electric language : A new variety of English , 1996 .

[13] Lynette Hirschman,et al. MITRE: Description of the Alembic System Used for MUC-6 , 1995, MUC.

[14] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.