Can We Make Information Extraction More Adaptive?

It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation , both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains. In this paper we discuss some methods that have been tried to ease this problem, and to create something more rapid than the benchmark one-month gure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have pre-existing templates ready to hand, as opposed to a user who has a vague idea of what is needed from a corpus. We shall discuss attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. An important issue is how far established methods in Information Retrieval of tuning to a user's needs with feedback at an interface can be transferred to IE.

[1]  Claire Cardie,et al.  University of Massachusetts: Description of the CIRCUS System as Used for MUC-4 , 1992, MUC.

[2]  Tomek Strzalkowski,et al.  Evaluating Natural Language Processing Techniques in Information Retrieval , 1999 .

[3]  Beth Sundheim,et al.  MUC-5 Evaluation Metrics , 1993, MUC.

[4]  Ken Samuel,et al.  Dialogue Act Tagging with Transformation-Based Learning , 1998, ACL.

[5]  Nancy Chinchor,et al.  The Statistical Significance of the MUC-4 Results , 1992, MUC.

[6]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[7]  Lynette Hirschman,et al.  MITRE: Description of the Alembic System Used for MUC-6 , 1995, MUC.

[8]  Roberto Basili,et al.  An empirical approach to Lexical Tuning , 2000 .

[9]  Ellen Riloff,et al.  Automatically Acquiring Conceptual Patterns without an Annotated Corpus , 1995, VLC@ACL.

[10]  Chuck Rieger,et al.  Parsing and comprehending with word experts (a theory and its realization) , 1982 .

[11]  Robin Collier,et al.  Automatic template creation for information extraction , 1998 .

[12]  Lynette Hirschman,et al.  Mixed-Initiative Development of Language Processing Systems , 1997, ANLP.

[13]  James Cussens,et al.  Using Inductive Logic Programming for Natural Language Processing , 1997 .

[14]  Robert J. Gaizauskas,et al.  Using Coreference Chains for Text Summarization , 1999, COREF@ACL.

[15]  Chris Mellish,et al.  Natural Language Processing in PROLOG , 1989 .

[16]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[17]  Louise Guthrie,et al.  Disambiguation: a Study in Weighted Preference* , 2022 .

[18]  Y. Wilks,et al.  A General Architecture for Text Engineering (gate) { a New Approach to Language Engineering R&d a General Architecture for Text Engineering (gate) | a New Approach to Language Engineering R&d a E G T , 1995 .

[19]  Hamid Khosravi-Bardsirpour,et al.  Extracting pragmatic content from Email , 1999 .

[20]  Claire Cardie,et al.  Empirical Methods in Information Extraction , 1997, AI Mag..

[21]  Ralph Grishman,et al.  Information Extraction: Techniques and Challenges , 1997, SCIE.

[22]  Yorick Wilks,et al.  Combining weak methods in large scale text processing , 1992 .

[23]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[24]  Ralph Grishman,et al.  New York University: description of the Proteus system as used for MUC-5 , 1993, MUC.

[25]  Nigel G. Ward Machine Translation: Past, Present, Future , 2001 .

[26]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[27]  E. Riloff,et al.  Automated dictionary construction for information extraction from text , 1993, Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications.

[28]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[29]  Yorick Wilks,et al.  Sense Tagging: Semantic Tagging with a Lexicon , 1997, ArXiv.

[30]  Daniel G. Bobrow,et al.  On Overview of KRL, a Knowledge Representation Language , 1976, Cogn. Sci..

[31]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[32]  Talmy Givón,et al.  TRANSFORMATIONS OF ELLIPSIS, SENSE DEVELOPMENT AND RULES OF LEXICAL DERIVATION, , 1967 .

[33]  Yves Schabes,et al.  Deterministic Part-of-Speech Tagging with Finite-State Transducers , 1995, Comput. Linguistics.

[34]  Graeme Hirst,et al.  Semantic Interpretation and the Resolution of Ambiguity , 1987, Studies in natural language processing.

[35]  V. Raskin,et al.  Ten Choices for Lexical Semantics , 1996 .

[36]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[37]  Rong Wang,et al.  CRL/Brandeis: The Diderot System , 1993, TIPSTER.

[38]  Yorick Wilks,et al.  Information Extraction: Beyond Document Retrieval , 1998, Int. J. Comput. Linguistics Chin. Lang. Process..

[39]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[40]  Marc Vilain,et al.  Validation of Terminological Inference in an Information Extraction Task , 1993, HLT.

[41]  Ted Briscoe,et al.  Default inheritance within unification-based approaches to the lexicon , 1992 .

[42]  Gerald Gazdar,et al.  DATR: A Language for Lexical Knowledge Representation , 1996, CL.

[43]  Jerry R. Hobbs The Generic Information Extraction System , 1993, MUC.

[44]  Lynette Hirschman,et al.  Evaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3) , 1993, CL.