A Supervised Learning Algorithm for Information Extraction from Textual Data

In this article we present a supervised learning algorithm for the discovery of finite state automata in the form of regular expressions in textual data. The automata generate languages that consist of various representations of features useful in information extraction. We have successfully applied this learning technique in the extraction of textual features from police incident reports [2]. In this article we present the result of the application of our algorithm in extraction of the ‘problem solved’ in patents. The ‘problem solved’ in a patent identifies the particular solution to an insufficiency in prior art that the patent addresses.