Bring vs. MTRoget: Evaluating automatic thesaurus translation

Evaluation of automatic language-independent methods for language technology resource creation is difficult, and confounded by a largely unknown quantity, viz. to what extent typological differences among languages are significant for results achieved for one language or language pair to be applicable across languages generally. In the work presented here, as a simplifying assumption, language-independence is taken as axiomatic within certain specified bounds. We evaluate the automatic translation of Roget’s “Thesaurus” from English into Swedish using an independently compiled Roget-style Swedish thesaurus, S.C. Bring’s “Swedish vocabulary arranged into conceptual classes” (1930). Our expectation is that this explicit evaluation of one of the thesaureses created in the MTRoget project will provide a good estimate of the quality of the other thesauruses created using similar methods.

[1]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[2]  Lindsay J. Evett,et al.  Text Segmentation Using Reiteration and Collocation , 1998, COLING-ACL.

[3]  Gerhard Weikum,et al.  A Machine Learning Approach to Building Aligned Wordnets , 2008 .

[4]  Lindsay J. Evett,et al.  Automatic Identification Of Cohesion In Texts: Exploiting The Lexical Organization Of Roget’s Thesaurus , 1995, ROCLING/IJCLCLP.

[5]  Werner Hüllen A History of Roget's Thesaurus: Origins, Development, and Design , 2004 .

[6]  Yorick Wilks,et al.  Language processing and the thesaurus , 1998 .

[7]  Markus Forsberg,et al.  The open lexical infrastructure of Spräkbanken , 2012, LREC.

[8]  Peter Mark Roget,et al.  Thesaurus of English words & phrases , 1912 .

[9]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Stan Szpakowicz,et al.  The Design and Implementation of an Electronic Lexical Knowledge Base , 2001, Canadian Conference on AI.

[12]  S. C. Bring,et al.  Svenskt ordförråd : ordnat i begreppsklasser , 1930 .

[13]  L. John Old Unlocking the Semantics of Roget?s Thesaurus Using Formal Concept Analysis , 2004, ICFCA.

[14]  Magnus Sahlgren,et al.  Automatic bilingual lexicon acquisition using random indexing of parallel corpora , 2005, Nat. Lang. Eng..

[15]  Gerhard Weikum,et al.  Mapping Roget's Thesaurus and WordNet to French , 2008, LREC.

[16]  Markus Forsberg,et al.  The Past Meets the Present in Swedish FrameNet , 2010 .

[17]  P. Cassidy An Investigation of the Semantic Relations in the Roget ’ s Thesaurus : Preliminary Results , 2010 .

[18]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[19]  Alistair Kennedy,et al.  Evaluating Roget's Thesauri , 2008, ACL.

[20]  L John Old Unlocking the semantics of Roget's Thesaurus. , 2004 .

[21]  Emily M. Bender Linguistic I Ssues in L Anguage Technology Lilt on Achieving and Evaluating Language-independence in Nlp on Achieving and Evaluating Language-independence in Nlp , 2022 .

[22]  B. V. Verghese,et al.  Thesaurus of English Words and Phrases , 2002 .