OLLIE: On-Line Learning for Information Extraction

This paper reports work aimed at developing an open, distributed learning environment, OLLIE, where researchers can experiment with different Machine Learning (ML) methods for Information Extraction. Once the required level of performance is reached, the ML algorithms can be used to speed up the manual annotation process. OLLIE uses a browser client while data storage and ML training is performed on servers. The different ML algorithms use a unified programming interface; the integration of new ones is straightforward.

[1]  Fredrik Olsson,et al.  Experiences of Language Engineering Algorithm Reuse , 2000, LREC.

[2]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[3]  Steven Bird,et al.  Models and Tools for Collaborative Annotation , 2002, LREC.

[4]  Alexiei Dingli,et al.  User-System Cooperation in Document Annotation Based on Information Extraction , 2002, EKAW.

[5]  Walter Daelemans,et al.  Evaluation of Machine Learning Methods for Natural Language Processing Tasks , 2002, LREC.

[6]  Kalina Bontcheva,et al.  A Unicode-based Environment for Creation and Use of Language Resources , 2002, LREC.

[7]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[9]  Nancy Ide,et al.  XCES: An XML-based Encoding Standard for Linguistic Corpora , 2000, LREC.

[10]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[11]  Susanne Hoche,et al.  Lerning Hidden Markov Models for Information Extraction Actively from Partially Labeled Text , 2002, Künstliche Intell..

[12]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[13]  Tony McEnery,et al.  EMILLE, A 67-Million Word Corpus of Indic Languages: Data Collection, Mark-up and Harmonisation , 2002, LREC.

[14]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.