Understanding and Exploiting Language Diversity

Languages are well known to be diverse on all structural levels, from the smallest (phonemic) to the broadest (pragmatic). We propose a set of formal, quantitative measures for the language diversity of linguistic phenomena, the resource incompleteness, and resource incorrectness. We apply all these measures to lexical semantics where we show how evidence of a high degree of universality within a given language set can be used to extend lexico-semantic resources in a precise, diversity-aware manner. We demonstrate our approach on several case studies: First is on polysemes and homographs among cases of lexical ambiguity. Contrarily to past research that focused solely on exploiting systematic polysemy, the notion of universality provides us with an automated method also capable of predicting irregular polysemes. Second is to automatically identify cognates from the existing lexical resource across different orthographies of genetically unrelated languages. Contrarily to past research that focused on detecting cognates from 225 concepts of Swadesh list, we captured 3.1 million cognate pairs across 40 different orthographies and 335 languages by exploiting the existing wordnet-like lexical resources.

[1]  J. Kellett London , 1914, The Hospital.

[2]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[3]  M. Swadesh Towards Greater Accuracy in Lexicostatistic Dating , 1955, International Journal of American Linguistics.

[4]  S. Potter,et al.  Universals of Language , 1966 .

[5]  C. B. Colby The weirdest people in the world , 1973 .

[6]  Jurij D. Apresjan REGULAR POLYSEMY , 1974 .

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  W. Bruce Croft Typology and Universals , 1990 .

[9]  Kees Hengeveld,et al.  A method of language sampling , 1993 .

[10]  J. Algeo David Crystal The Cambridge Encyclopedia of the English Language , 1997 .

[11]  Servicio Geológico Colombiano Sgc Volume 4 , 2013, Journal of Diabetes Investigation.

[12]  James Pustejovsky,et al.  Corelex: systematic polysemy and underspecification , 1998 .

[13]  D. Crystal The Cambridge Encyclopedia of the English Language , 1998 .

[14]  J. Schilperoord,et al.  Linguistics , 1999 .

[15]  R. Millikan On Clear and Confused Ideas: An Essay about Substance Concepts , 2000 .

[16]  Wim Peters,et al.  Metonymy as a Cross-lingual Phenomenon , 2003, ACL 2003.

[17]  Stefano Spaccapietra,et al.  Journal on Data Semantics I , 2003, Lecture Notes in Computer Science.

[18]  April M. S. McMahon,et al.  Language classification by numbers , 2005 .

[19]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[20]  Fausto Giunchiglia Managing Diversity in Knowledge , 2006, IEA/AIE.

[21]  S. Levinson,et al.  The myth of language universals: language diversity and its importance for cognitive science. , 2009, The Behavioral and brain sciences.

[22]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[23]  Fausto Giunchiglia,et al.  S-Match: An open source framework for matching lightweight ontologies , 2012, Semantic Web.

[24]  David B. Leake,et al.  European Conference on Artificial Intelligence ECAI 2012 , 2012 .

[25]  Fausto Giunchiglia,et al.  A Facet-Based Methodology for the Construction of a Large-Scale Geospatial Ontology , 2012, Journal on Data Semantics.

[26]  Mahesh Srinivasan,et al.  How concepts and conventions structure the lexicon: Cross-linguistic evidence from polysemy , 2014 .

[27]  Nick Cercone,et al.  Computational Linguistics , 1986, Communications in Computer and Information Science.

[28]  Fausto Giunchiglia,et al.  Concepts as (Recognition) Abilities , 2016, FOIS.

[29]  Fausto Giunchiglia,et al.  A Taxonomic Classification of WordNet Polysemy Types , 2016, GWC.

[30]  Ian Maddieson,et al.  On the universal structure of human lexical semantics , 2015, Proceedings of the National Academy of Sciences.

[31]  Fausto Giunchiglia,et al.  Language and domain aware lightweight ontology matching , 2017, J. Web Semant..

[32]  L. Aiello,et al.  The Origin and Diversification of Language , 2017 .