General learning approach for event extraction: Case of management change event

Starting from an ontology of a targeted financial domain corresponding to transaction, performance and management change news, relevant segments of text containing at least a domain keyword are extracted. The linguistic pattern of each segment is automatically generated to serve initially as a learning model. Each pattern is composed of named entities, keywords and articulation words. Some generic named entities like organizations, persons, locations, dates and grammatical annotations are generated by an automatic tool. During the learning step, each relevant segment is manually annotated with respect to the targeted entities (roles) structuring an event of the ontology. Information extraction is processed by associating a role with a specific entity. By alignment of generic entities to specific entities, some strings of a text are automatically annotated. An original learning approach is presented. Experiments with the management change event showed how recognition rates are improved by using different generalization tools.

[1]  Douglas E. Appelt,et al.  FASTUS: A System for Extracting Information from Text , 1993, HLT.

[2]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[3]  Ralph Grishman,et al.  A Decision Tree Method for Finding and Classifying Names in Japanese Texts , 1998, VLC@COLING/ACL.

[4]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[5]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[6]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[7]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[8]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[9]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[10]  Xavier Carreras,et al.  A Simple Named Entity Extractor using AdaBoost , 2003, CoNLL.

[11]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[12]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[13]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[14]  K. Goodman,et al.  Encyclopedia of Language and Linguistics , 2006 .

[15]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[16]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[17]  ChengXiang Zhai,et al.  A Systematic Exploration of the Feature Space for Relation Extraction , 2007, NAACL.

[18]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[19]  Patrick Schone,et al.  Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.

[20]  Romaric Besançon,et al.  Using Temporal Cues for Segmenting Texts into Events , 2010, IceTAL.

[21]  Sivaji Bandyopadhyay,et al.  Named Entity Recognition using Support Vector Machine: A Language Independent Approach , 2010 .

[22]  Romaric Besançon,et al.  LIMA : A Multilingual Framework for Linguistic Analysis and Linguistic Resources Development and Evaluation , 2010, LREC.

[23]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..