A platform for semantic annotations and ontology population using conditional random fields

Ontologies are widely used for organising and sharing knowledge. But elaborating these resources is a heavy and time-consuming task. This paper is two-fold: it describes EADS DCS text-mining platform, in particular, its service to annotate documents with semantic tags and it presents its extension for incremental learning of ontologies. Domain experts are assisted in the ontology population task by recent machine learning techniques (i.e. conditional random fields). Comparisons are made between annotations from the ontology and from a trained CRF model, so as to detect candidate instances. An iterative process controlled by the experts results in knowledge discovery and constitution of an accurate ontology.

[1]  Mikhail Bilenko and Raymond J. Mooney Alignments and String Similarity in Information Integration: A Random Field Approach , 2005 .

[2]  George A. Vouros,et al.  Enhancing Ontological Knowledge Through Ontology Population and Enrichment , 2004, EKAW.

[3]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[4]  Bruno Grilhères,et al.  Combining classifiers for harmful document filtering , 2004, RIAO.

[5]  Borys Omelayenko,et al.  Learning of Ontologies from the Web: the Analysis of Existent Approaches , 2001, WebDyn@ICDT.

[6]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[7]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[8]  Olivier Teytaud,et al.  Identification de la langue et catégorisation de textes basées sur les N-grammes , 2001, EGC.

[9]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange : TEI P4 , 2002 .

[10]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Andrew McCallum,et al.  Accurate Information Extraction from Research Papers using Conditional Random Fields , 2004, NAACL.

[13]  David E. Millard,et al.  Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation , 2003 .

[14]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[15]  Paul A. Viola,et al.  Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[16]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[17]  John W. Lloyd,et al.  Classification of Individuals with Complex Structure , 2000, ICML.

[18]  Andrzej Skowron,et al.  Proceedings of the 2005 IEEE / WIC / ACM International Conference on Web Intelligence , 2005 .

[19]  Hanna M. Wallach,et al.  Efficient Training of Conditional Random Fields , 2002 .

[20]  Steffen Staab,et al.  KAON - Towards a Large Scale Semantic Web , 2002, EC-Web.

[21]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[22]  Chantal Soulé-Dupuis,et al.  Coupling approaches, coupling media and coupling languages for information retrieval , 2004 .

[23]  Maria Vargas-Vera,et al.  Semi-Automatic Population of Ontologies from Text , 2004 .