论文信息 - An Empirical Study of Automated Dictionary Construction for Information Extraction in Three Domains

An Empirical Study of Automated Dictionary Construction for Information Extraction in Three Domains

Abstract A primary goal of natural language processing researchers is to develop a knowledge-based natural language processing (NLP) system that is portable across domains. However, most knowledge-based NLP systems rely on a domain-specific dictionary of concepts, which represents a substantial knowledge-engineering bottleneck. We have developed a system called AutoSlog that addresses the knowledge-engineering bottleneck for a task called information extraction . AutoSlog automatically creates domain-specific dictionaries for information extraction, given an appropriate training corpus. We have used AutoSlog to create a dictionary of extraction patterns for terrorism, which achieved 98% of the performance of a hand-crafted dictionary that required approximately 1500 person-hours to build. In this paper, we describe experiments with AutoSlog in two additional domains: joint ventures and microelectronics. We compare the performance of AutoSlog across the three domains, discuss the lessons learned about the generality of this approach, and present results from two experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog.

Ellen Riloff | E. Riloff

[1] Tom Michael Mitchell,et al. Explanation-based generalization: A unifying view , 1986 .

[2] Philip J. Hayes,et al. CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[3] Claire Cardie,et al. University of Massachusetts: MUC-3 test results and analysis , 1991, MUC.

[4] Lisa F. Rau,et al. GE NLToolset: description of the system as used for MUC-4 , 1992, MUC.

[5] Douglas E. Appelt,et al. SRI International: description of the FASTUS system used for MUC-4 , 1992, MUC.

[6] Ellen Riloff,et al. Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[7] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8] 林良彦,et al. Acquiring Lexical Knowledge from Text : A Case Study , 1989 .

[9] Richard Edward Cullingford,et al. Script application: computer understanding of newspaper stories. , 1977 .

[10] Claire Cardie,et al. UMass/Hughes: Description of the CIRCUS System Used for MUC-51 , 1993, MUC.

[11] Richard M. Schwartz,et al. Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.