The GOLD Community of Practice: an infrastructure for linguistic data on the Web

The GOLD Community of Practice is proposed as a model for linking on-line linguistic data to an ontology. The key components of the model include the linguistic data resources themselves and those focused on the knowledge derived from data. Data resources include the ever-increasing amount of linguistic field data and other descriptive language resources being migrated to the Web. The knowledge resources capture generalizations about the data and are anchored in the General Ontology for Linguistic Description (GOLD). It is argued that such a model is in the spirit of the vision for a Semantic Web and, thus, provides a concrete methodology for rendering highly divergent resources semantically interoperable. The focus of this work, then, is not on annotation at the syntactic level, but rather on how annotated Web resources can be linked to an ontology. Furthermore, a methodology is given for creating specific communities of practice within the overall Web infrastructure for linguistics. Finally, ontology-driven search is discussed as a key application of the proposed model.

[1]  Gary Simons,et al.  Seven Dimensions of Portability for Language Documentation and Description , 2002, ArXiv.

[2]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[3]  William D. Lewis ODIN: A Model for Adapting and Enriching Legacy Infrastructure , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[4]  Adam Pease,et al.  IEEE standard upper ontology: a progress report , 2002, The Knowledge Engineering Review.

[5]  Steven Bird,et al.  Towards a General Model of Linguistic Paradigms , 2004 .

[6]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[7]  D. Terence Langendoen,et al.  A rationale for the TEI recommendations for feature-structure markup , 1995, Comput. Humanit..

[8]  Laurent Romary,et al.  International standard for a linguistic annotation framework , 2003, HLT-NAACL 2003.

[9]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[10]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[11]  Sharon C. Adler Previous version: , 1997 .

[12]  Marc Kemps-Snijders,et al.  LEXUS, a web-based tool for manipulating lexical resources lexicon , 2006, LREC.

[13]  Joan L. Bybee,et al.  The Creation of Tense and Aspect Systems in the Languages of the World , 1989 .

[14]  Brian Fitzsimons,et al.  A Model for Interoperability : XML Documents as an RDF Database , 2004 .

[15]  Nicola Guarino,et al.  WonderWeb Deliverable D18 Ontology Library , 2003 .

[16]  toExcel Extensible Stylesheet Language: Xsl Version 1.0 , 1999 .

[17]  Barry Smith,et al.  A Strategy for Improving and Integrating Biomedical Ontologies , 2005, AMIA.

[18]  Nicola Guarino,et al.  The WonderWeb Library of Foundational Ontologies Preliminary Report , 2002 .

[19]  Laurent Romary,et al.  Outline of the International Standard Linguistic Annotation Framework , 2003, ACL.

[20]  William Lewis Mining and Migrating Interlinear Glossed Text , 2003 .

[21]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange : TEI P4 , 2002 .

[22]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[23]  Scott Farrar,et al.  A linguistic ontology for the semantic web , 2003 .

[24]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[25]  Gary Simons,et al.  Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources , 2003, Comput. Humanit..

[26]  Gary Simons,et al.  The Open Language Archives Community: An Infrastructure for Distributed Archiving of Language Resources , 2003, Lit. Linguistic Comput..

[27]  William Lewis,et al.  The Semantics of Markup: Mapping Legacy Markup Schemas to a Common Semantics , 2004, NLPXML@ACL.

[28]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[29]  Mark Davis,et al.  The Unicode Standard, Version 3.0 , 2000 .

[30]  A. Zwicky,et al.  The handbook of morphology , 2001 .

[31]  F. Zambrano,et al.  Reflections on the Huallaga Quechua dictionary: derived forms as subentries , 2002 .

[32]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[33]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[34]  Andrea C. Schalley,et al.  Ontolinguistics: How Ontological Status Shapes the Linguistics Coding of Concepts , 2007 .

[35]  Andrea C. Schalley,et al.  Using ‘Ontolinguistics’ for language description , 2007 .

[36]  D. Terence Langendoen,et al.  Bridging the Markup Gap : Smart Search Engines for Language Researchers , 2002 .

[37]  D. Terence Langendoen,et al.  An Ontology for Linguistic Annotation , 2002 .

[38]  Benjamin Bruening,et al.  Syntax at the edge : cross-clausal phenomena and the syntax of passamaquoddy , 2001 .

[39]  D. Terence Langendoen,et al.  A Rationale for the TEI Recommendations for Feature-Structure Markup , 1995 .

[40]  Scott Farrar Using ‘ Ontolinguistics ’ for language description , 2006 .

[41]  Nicoletta Calzolari,et al.  RDF Instantiation of ISLE/MILE Lexical Entries , 2003, ACL.