Descriptive Phrases: Understanding Natural Language Metadata

Fast development of information and communication technologies made available vast amounts of heterogeneous information. With these amounts growing faster and faster, information integration and search technologies are becoming a key for the success of information society. To handle such amounts efficiently, data needs to be leveraged and analysed at deep levels. Metadata is a traditional way of getting leverage over the data. Deeper levels of analysis include language analysis, starting from purely string-based (keyword) approaches, continuing with syntactic-based approaches and now semantics is about to be included in the processing loop. Metadata gives a leverage over the data. Often a natural language, being the easiest way of expression, is used in metadata. We call such metadata "natural language metadata". The examples include various titles, captions and labels, such as web directory labels, picture titles, classification labels, business directory category names. These short pieces of text usually describe (sets of ) objects. We call them "descriptive phrases". This thesis deals with a problem of understanding natural language metadata for its further use in semantics aware applications. This thesis contributes by portraying descriptive phrases, using the results of analysis of several collected and annotated datasets of natural language metadata. It provides an architecture for the natural language metadata understanding, complete with the algorithms and the implementation. This thesis contains the evaluation of the proposed architecture.

[1]  Rolf Schwitter,et al.  Creating and Querying Linguistically Motivated Ontologies , 2008 .

[2]  Rolf Schwitter,et al.  Let's talk in description logic via controlled natural language , 2006 .

[3]  Fausto Giunchiglia,et al.  Lightweight Ontologies , 2009, Encyclopedia of Database Systems.

[4]  Norbert E. Fuchs,et al.  Web-Annotations for Humans and Machines , 2007, ESWC.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Norbert E. Fuchs,et al.  Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions , 2006, DILS.

[7]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[8]  Catherine Dolbear,et al.  Rabbit to OWL: Ontology Authoring with a CNL-Based Tool , 2009, CNL.

[9]  Christiane Fellbaum,et al.  Language to Logic Translation with PhraseBank , 2004 .

[10]  Vasile Rus,et al.  Bracketing Compound Nouns for Logic Form Derivation , 2002, FLAIRS.

[11]  Fausto Giunchiglia,et al.  Concept Search , 2009, ESWC.

[12]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[13]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[14]  Norbert E. Fuchs,et al.  Controlled natural language can replace first-order logic , 1999, 14th IEEE International Conference on Automated Software Engineering.

[15]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[16]  Xuanjing Huang,et al.  From Web Directories to Ontologies: Natural Language Processing Challenges , 2007, ISWC/ASWC.

[17]  Fausto Giunchiglia,et al.  Computing Minimal Mappings , 2009, OM.

[18]  Fausto Giunchiglia,et al.  Encoding Classifications into Lightweight Ontologies , 2006, ESWC.

[19]  Fabio Rinaldi,et al.  Attempto Controlled English: A Knowledge Representation Language Readable by Humans and Machines , 2005, Reasoning Web.

[20]  Dominic Abrams,et al.  Language, Speech, and Communication , 2006 .

[21]  Kaarel Kaljurand,et al.  Bidirectional Mapping Between OWL DL and Attempto Controlled English , 2006, PPSWR.

[22]  Norbert E. Fuchs,et al.  A Natural Language Front-End to Model Generation , 1999 .

[23]  Bob J. Wielinga,et al.  From thesaurus to ontology , 2001, K-CAP '01.

[24]  Yannis Kalfoglou,et al.  Cases on Semantic Interoperability for Information Systems Integration - Practices and Applications , 2009, Cases on Semantic Interoperability for Information Systems Integration.

[25]  Catherine Dolbear,et al.  Rabbit: Developing a Control Natural Language for Authoring Ontologies , 2008, ESWC.

[26]  Rolf Schwitter,et al.  Controlled Natural Language meets the SemanticWeb , 2004, ALTA.

[27]  Rudi Studer,et al.  The Semantic Web: Research and Applications , 2004, Lecture Notes in Computer Science.

[28]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[29]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[30]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[31]  Ralph Grishman,et al.  Towards Best Practice for Multiword Expressions in Computational Lexicons , 2002, LREC.

[32]  Guus Schreiber,et al.  The Semantic Web – ISWC 2004 , 2004, Lecture Notes in Computer Science.

[33]  Vasile Rus High precision logic form transformation , 2001, Proceedings 13th IEEE International Conference on Tools with Artificial Intelligence. ICTAI 2001.

[34]  Véronique Malaisé,et al.  A Method to Convert Thesauri to SKOS , 2006, ESWC.

[35]  Rolf Schwitter,et al.  Working for Two: A Bidirectional Grammar for a Controlled Natural Language , 2008, Australasian Conference on Artificial Intelligence.

[36]  Brian Davis,et al.  On Controlled Natural Languages: Properties and Prospects , 2009, CNL.

[37]  Fausto Giunchiglia,et al.  A large dataset for the evaluation of ontology matching , 2009, The Knowledge Engineering Review.

[38]  James A. Hendler,et al.  Agents and the Semantic Web , 2001, IEEE Intell. Syst..

[39]  Colin White,et al.  An Update on PENG Light , 2009, ALTA.

[40]  Fausto Giunchiglia,et al.  Structure Preserving Semantic Matching , 2007, OM.

[41]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[42]  José Júlio Alferes,et al.  Principles and Practice of Semantic Web Reasoning: 4th International Workshop, PPSWR 2006, Budva, Montenegro, June 10-11, 2006, Revised Selected Papers (Lecture Notes in Computer Science) , 2006 .

[43]  Thomas Andreas Meyer,et al.  Sydney OWL Syntax - towards a Controlled Natural Language Syntax for OWL 1.1 , 2007, OWLED.

[44]  Uta Schwertel,et al.  Controlling Plural Ambiguities in Attempto Controlled English (ACE) , 2000 .

[45]  Norbert E. Fuchs,et al.  Attempto Controlled English - Not Just Another Logic Specification Language , 1998, LOPSTR.

[46]  Catherine Dolbear,et al.  A Comparison of three Controlled Natural Languages for OWL 1.1 , 2008, OWLED.

[47]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[48]  Martin Hepp Representing the Hierarchy of Industrial Taxonomies in OWL: The gen/tax Approach , 2005 .

[49]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[50]  Chong Wang,et al.  PANTO: A Portable Natural Language Interface to Ontologies , 2007, ESWC.

[51]  Rolf Schwitter Reconstructing Hard Problems in a Human-Readable and Machine-Processable Way , 2008, PRICAI.

[52]  Catherine Dolbear,et al.  ROO: Involving Domain Experts in Authoring OWL Ontologies , 2008, International Semantic Web Conference.

[53]  Jos de Bruijn,et al.  GenTax: A Generic Methodology for Deriving OWL and RDF-S Ontologies from Hierarchical Classifications, Thesauri, and Inconsistent Taxonomies , 2007, ESWC.

[54]  Fausto Giunchiglia,et al.  Semantic Matching: Algorithms and Implementation , 2007, J. Data Semant..

[55]  Embarcadero Mountain,et al.  An English to Logic Translator for Ontology-based Knowledge Representation Languages , 2003 .

[56]  Fausto Giunchiglia,et al.  Formalizing the Get-Specific Document Classification Algorithm , 2007, ECDL.

[57]  Michael R. Genesereth,et al.  Knowledge Interchange Format , 1991, KR.

[58]  Michael Kifer,et al.  The Semantic Web: Research and Applications, 4th European Semantic Web Conference, ESWC 2007, Innsbruck, Austria, June 3-7, 2007, Proceedings , 2007, ESWC.

[59]  York Sure,et al.  The semantic Web : research and applications : 3rd European Semantic Web Conference, ESWC 2006 Budva, Montenegro, June 11-14, 2006 : proceedings , 2006 .

[60]  Diego Calvanese,et al.  Expressing DL-Lite Ontologies with Controlled English , 2007, Description Logics.

[61]  Jonathan Pool Can Controlled Languages Scale to the Web , 2006 .

[62]  Rada Mihalcea,et al.  SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text , 2005, ACL.

[63]  Fausto Giunchiglia,et al.  Service Integration through Structure-Preserving Semantic Matching , 2009, Cases on Semantic Interoperability for Information Systems Integration.

[64]  Gerold Schneider,et al.  Attempto Controlled English Meets the Challenges of Knowledge Representation, Reasoning, Interoperability and User Interfaces , 2006, FLAIRS.

[65]  Ralf Schwitter,et al.  ECOLE: a look-ahead editor of controlled language , 2003, EAMT.

[66]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[67]  Kaarel Kaljurand,et al.  Verbalizing OWL in Attempto Controlled English , 2007, OWLED.

[68]  Yvette J. Tenney,et al.  A Methodology for Extrinsically Evaluating Information Extraction Performance , 2005, HLT/EMNLP.

[69]  Abraham Bernstein,et al.  GINO - A Guided Input Natural Language Ontology Editor , 2006, SEMWEB.

[70]  Abraham Bernstein,et al.  Talking to the Semantic Web - A Controlled English Query Interface for Ontologies* , 2004 .

[71]  Kaarel Kaljurand,et al.  Mapping Attempto Controlled English to OWL DL , 2006 .

[72]  Vasile Rus,et al.  Transformation of WordNet Glosses into Logic Forms , 2001, FLAIRS Conference.

[73]  Massimo Marchiori,et al.  Towards a People's Web: Metalog , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[74]  Vasile Rus,et al.  Logic Form Transformation of WordNet and its Applicability to Question Answering , 2001, ACL.

[75]  Rolf Schwitter,et al.  English as a formal specification language , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[76]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[77]  S Hoefler The syntax of Attempto Controlled English: An abstract grammar for ACE 4.0 , 2004 .

[78]  Bob J. Wielinga,et al.  Methods for Porting Resources to the Semantic Web , 2004, ESWS.