Information Extraction via Double Classification

Information Extraction is concerned with extracting relevant information from a (collection of) documents. We propose an approach consisting of two classification-based machine learning loops. In a first loop we look for the relevant sentences in a document. In the second loop, we perform a word-level classification. We test the system on the Software Jobs corpus and we do an extensive evaluation in which we discuss the influence of the different parameters. Furthermore we show that the type of evaluation method has an important influence on the results.