Acquisition of Linguistic Patterns

This paper presents an automatic acquisition of lin- guisticpatterns that can be used for knowledge-based information extraction from texts. In knowledge-based approach to informa- tion extraction, linguistic patterns play a central role in the rec- ognition and classification of input texts. Although the knowl- edge-based approach has been proved effective for information extraction on limited domains, there are difficulties in construc- tion of a large number of domain-specific linguistic patterns. Manual creation of patterns is time consuming and error prone, even for a small application domain. To solve the scalability and the portability problem, an automatic acquisition of patterns must be provided. In this paper, we present the PALKA (Parallel Automatic Linguistic Knowledge Acquisition) system that ac- quires linguistic patterns from a set of domain-specific training texts and their desired outputs. A specialized representation of patterns called FP-structures has been defined. Patterns are con- structed in the form of FP-structures from training texts, and the acquired patterns are tuned further through the generalization of semantic constraints. Inductive learning mechanism is applied in the generalization step. The PALKA system has been used to gen- erate patterns for our information extraction system developed for the fourth Message Understanding Conference (MUC-4). The MUC-4 was an ARPA-sponsored competitive evaluation of text analysis systems. Experimental results with a set of news articles from MUC-4 are discussed.

[1]  Jaime G. Carbonell,et al.  Towards a Self-Extending Parser , 1979, ACL.

[2]  Gerald DeJong Prediction and substantiation: A new approach to natural language processing , 1979 .

[3]  Paola Velardi,et al.  Acquisition of semantic patterns from a natural corpus of texts , 1989, SGAR.

[4]  Joseph D. Becker The Phrasal Lexicon , 1975, TINLAP.

[5]  Dan I. Moldovan,et al.  Semantic knowledge acquisition for information extraction from texts on parallel marker-passing computer , 1993 .

[6]  W. G. Lehnert The Role of Scripts in Understanding , 1979 .

[7]  Paul S. Jacobs,et al.  Using statistical methods to improve knowledge-based news categorization , 1993, IEEE Expert.

[8]  Richard Granger,et al.  FOUL-UP: A Program that Figures Out Meanings of Words from Context , 1977, IJCAI.

[9]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[10]  Victor Sadler,et al.  Review of Lexical acquisition: exploiting on-line resources to build a lexicon by Uri Zernik. Lawrence Erlbaum Associates 1991. , 1993 .

[11]  Robert C. Berwick,et al.  The acquisition of syntactic knowledge , 1985 .

[12]  Dan I. Moldovan,et al.  SNAP: parallel processing applied to AI , 1992, Computer.

[13]  Dan I. Moldovan,et al.  USC: description of the SNAP system used for MUC-4 , 1992, MUC.

[14]  Hirotani Kitano,et al.  Phi DM-Dialog: an experimental speech-to-speech dialog translation system , 1991, Computer.

[15]  Jill Fain Lehman,et al.  Adaptive parsing - self-extending natural language interfaces , 1992, The Kluwer international series in engineering and computer science.

[16]  Michael G. Dyer,et al.  The Self-Extending Phrasal Lexicon , 1987, Comput. Linguistics.

[17]  Charles Eugene Martin,et al.  Direct memory access parsing , 1992 .

[18]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[19]  Ronald J. Brachman,et al.  An overview of the KL-ONE Knowledge Representation System , 1985 .