论文信息 - ISOcat: Corralling Data Categories in the Wild

ISOcat: Corralling Data Categories in the Wild

To achieve true interoperability for valuable linguistic resources different levels of variation need to be addressed. ISO Technical Committee 37, Terminology and other language and content resources, is developing a Data Category Registry. This registry will provide a reusable set of data categories. A new implementation, dubbed ISOcat, of the registry is currently under construction. This paper shortly describes the new data model for data categories that will be introduced in this implementation. It goes on with a sketch of the standardization process. Completed data categories can be reused by the community. This is done by either making a selection of data categories using the ISOcat web interface, or by other tools which interact with the ISOcat system using one of its various Application Programming Interfaces. Linguistic resources that use data categories from the registry should include persistent references, e.g. in the metadata or schemata of the resource, which point back to their origin. These data category references can then be used to determine if two or more resources share common semantics, thus providing a level of interoperability close to the source data and a promising layer for semantic alignment on higher levels.

[1] Claudia Soria,et al. Lexical Markup Framework (LMF) , 2006, LREC.

[2] Menzo Windhouwer,et al. Sustainable operability: keeping complex resources alive , 2008 .

[3] Marc Kemps-Snijders,et al. LEXUS, a web-based tool for manipulating lexical resources lexicon , 2006, LREC.

[4] Marc Kemps-Snijders,et al. An API for accessing the Data Category Registry , 2006, LREC.

[5] Sam Ruby,et al. RESTful Web Services , 2007 .

[6] Nancy Ide,et al. A Registry of Standard Data Categories for Linguistic Annotation , 2004, LREC.

[7] I. Melzer. Web Services Description Language , 2010 .