Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction

The paper presents an automatic acquisition of linguistic patterns that can be used for knowledge based information extraction from texts. In knowledge based information extraction, linguistic patterns play a central role in the recognition and classification of input texts. Although the knowledge based approach has been proved effective for information extraction on limited domains, there are difficulties in construction of a large number of domain specific linguistic patterns. Manual creation of patterns is time consuming and error prone, even for a small application domain. To solve the scalability and the portability problem, an automatic acquisition of patterns must be provided. We present the PALKA (Parallel Automatic Linguistic Knowledge Acquisition) system that acquires linguistic patterns from a set of domain specific training texts and their desired outputs. A specialized representation of patterns called FP structures has been defined. Patterns are constructed in the form of FP structures from training texts, and the acquired patterns are tuned further through the generalization of semantic constraints. Inductive learning mechanism is applied in the generalization step. The PALKA system has been used to generate patterns for our information extraction system developed for the fourth Message Understanding Conference (MUC-4). >

[1]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[2]  Ronald J. Brachman,et al.  An overview of the KL-ONE Knowledge Representation System , 1985 .

[3]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[4]  Dan I. Moldovan,et al.  SNAP: parallel processing applied to AI , 1992, Computer.

[5]  Paul S. Jacobs,et al.  Acquiring Lexical Knowledge from Text: A Case Study , 1988, AAAI.

[6]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[7]  Robert Wilensky,et al.  Artificial Intelligence and Language Processing Talking to Unix in English: an Overview of Uc , 2022 .

[8]  Paul S. Jacobs,et al.  Using statistical methods to improve knowledge-based news categorization , 1993, IEEE Expert.

[9]  Paola Velardi,et al.  Acquisition of semantic patterns from a natural corpus of texts , 1989, SGAR.

[10]  Robert C. Berwick,et al.  The acquisition of syntactic knowledge , 1985 .

[11]  Richard Granger,et al.  FOUL-UP: A Program that Figures Out Meanings of Words from Context , 1977, IJCAI.

[12]  Gerald DeJong Prediction and substantiation: A new approach to natural language processing , 1979 .

[13]  Joseph D. Becker The Phrasal Lexicon , 1975, TINLAP.

[14]  Dan I. Moldovan,et al.  Semantic knowledge acquisition for information extraction from texts on parallel marker-passing computer , 1993 .

[15]  W. G. Lehnert The Role of Scripts in Understanding , 1979 .

[16]  Victor Sadler,et al.  Review of Lexical acquisition: exploiting on-line resources to build a lexicon by Uri Zernik. Lawrence Erlbaum Associates 1991. , 1993 .

[17]  Dan I. Moldovan,et al.  USC: description of the SNAP system used for MUC-4 , 1992, MUC.

[18]  Alexander G. Hauptmann From Syntax to Meaning in Natural Language Processing , 1991, AAAI.

[19]  Dan I. Moldovan,et al.  Acquisition of semantic patterns for information extraction from corpora , 1993, Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications.

[20]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[21]  Jaime G. Carbonell,et al.  Towards a Self-Extending Parser , 1979, ACL.

[22]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[23]  U. Zernik Strategies in language acquisition: learning phrases in context , 1987 .

[24]  Claire Cardie,et al.  University of Massachusetts: Description of the CIRCUS System as Used for MUC-4 , 1992, MUC.

[25]  E. Riloff,et al.  Automated dictionary construction for information extraction from text , 1993, Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications.

[26]  Dan I. Moldovan,et al.  USC: MUC-4 test results and analysis , 1992, MUC.

[27]  Charles Eugene Martin,et al.  Direct memory access parsing , 1992 .

[28]  Hirotani Kitano,et al.  Phi DM-Dialog: an experimental speech-to-speech dialog translation system , 1991, Computer.

[29]  Jill Fain Lehman,et al.  Adaptive parsing - self-extending natural language interfaces , 1992, The Kluwer international series in engineering and computer science.

[30]  Michael G. Dyer,et al.  The Self-Extending Phrasal Lexicon , 1987, Comput. Linguistics.