Lightweight Parsing of Classifications into Lightweight Ontologies

Understanding metadata written in natural language is a premise to successful automated integration of large scale, language-rich, classifications such as the ones used in digital libraries. We analyze the natural language labels within classification by exploring their syntactic structure, we then show how this structure can be used to detect patterns of language that can be processed by a lightweight parser with an average accuracy of 96.82%. This allows for a deeper understanding of natural language metadata semantics, which we show can improve by almost 18% the accuracy of the automatic translation of classifications into lightweight ontologies required by semantic matching, search and classification algorithms.

[1]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[2]  Norbert E. Fuchs,et al.  Web-Annotations for Humans and Machines , 2007, ESWC.

[3]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[4]  Rolf Schwitter,et al.  Let's talk in description logic via controlled natural language , 2006 .

[5]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[6]  Fausto Giunchiglia,et al.  Mapping large-scale Knowledge Organization Systems , 2009 .

[7]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[8]  Thomas Andreas Meyer,et al.  Sydney OWL Syntax - towards a Controlled Natural Language Syntax for OWL 1.1 , 2007, OWLED.

[9]  Alon Y. Halevy,et al.  Semantic Integration Research in the Database Community : A Brief Survey , 2005 .

[10]  Fausto Giunchiglia,et al.  Semantic Schema Matching , 2005, OTM Conferences.

[11]  Chong Wang,et al.  PANTO: A Portable Natural Language Interface to Ontologies , 2007, ESWC.

[12]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[13]  Fausto Giunchiglia,et al.  Discovering Missing Background Knowledge in Ontology Matching , 2006, ECAI.

[14]  Abraham Bernstein,et al.  GINO - A Guided Input Natural Language Ontology Editor , 2006, SEMWEB.

[15]  Gerold Schneider,et al.  Attempto Controlled English Meets the Challenges of Knowledge Representation, Reasoning, Interoperability and User Interfaces , 2006, FLAIRS.

[16]  Xuanjing Huang,et al.  From Web Directories to Ontologies: Natural Language Processing Challenges , 2007, ISWC/ASWC.

[17]  Jos de Bruijn,et al.  GenTax: A Generic Methodology for Deriving OWL and RDF-S Ontologies from Hierarchical Classifications, Thesauri, and Inconsistent Taxonomies , 2007, ESWC.

[18]  Fausto Giunchiglia,et al.  Semantic Matching: Algorithms and Implementation , 2007, J. Data Semant..

[19]  Dean Allemang,et al.  The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006, Proceedings , 2006, SEMWEB.

[20]  Fausto Giunchiglia,et al.  Concept Search , 2009, ESWC.

[21]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[22]  Catherine Dolbear,et al.  Rabbit to OWL: Ontology Authoring with a CNL-Based Tool , 2009, CNL.

[23]  Fausto Giunchiglia,et al.  A Large Scale Dataset for the Evaluation of Ontology Matching Systems , 2008 .

[24]  Martha Palmer,et al.  Using semantic relations to improve information retrieval , 2005 .

[25]  Fausto Giunchiglia,et al.  Lightweight Ontologies , 2009, Encyclopedia of Database Systems.

[26]  Ralf Schwitter,et al.  ECOLE: a look-ahead editor of controlled language , 2003, EAMT.

[27]  Fausto Giunchiglia,et al.  Formalizing the Get-Specific Document Classification Algorithm , 2007, ECDL.

[28]  Jonathan Pool Can Controlled Languages Scale to the Web , 2006 .