论文信息 - A Supervised Learning Algorithm for Information Extraction from Textual Data

A Supervised Learning Algorithm for Information Extraction from Textual Data

In this article we present a supervised learning algorithm for the discovery of finite state automata in the form of regular expressions in textual data. The automata generate languages that consist of various representations of features useful in information extraction. We have successfully applied this learning technique in the extraction of textual features from police incident reports [2]. In this article we present the result of the application of our algorithm in extraction of the ‘problem solved’ in patents. The ‘problem solved’ in a patent identifies the particular solution to an insufficiency in prior art that the patent addresses.

[1] C. J. van Rijsbergen,et al. Information Retrieval , 1979, Encyclopedia of GIS.

[2] C. Cleverdon. On the Inverse Relationship of Recall and Precision. , 1972 .

[3] Robert A. Connolly,et al. Market value and patents : A Bayesian approach , 1988 .

[4] Adwait Ratnaparkhi,et al. A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[5] Eric Brill,et al. Pattern-Based Disambiguation for Natural Language Processing , 2000, EMNLP.

[6] Manuel Trajtenberg,et al. Market Value and Patent Citations: A First Look , 2000 .

[7] Eric Brill,et al. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[8] Hsinchun Chen,et al. Extracting Meaningful Entities from Police Narrative Reports , 2002, DG.O.

[9] Stephen Soderland,et al. Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[10] William M. Pottenger,et al. A Semi-supervised Algorithm for Pattern Discovery in Information Extraction from Textual Data , 2003, PAKDD.

[11] Anthony F. Breitzman,et al. Technological Powerhouse or Diluted Competence: Techniques for Assessing Mergers Via Patent Analysis , 2002 .

[12] M. W. Shields. An Introduction to Automata Theory , 1988 .

[13] Jeffrey D. Ullman,et al. Introduction to Automata Theory, Languages and Computation , 1979 .