Clustering for semantic purposes

This paper presents an innovative approach, within the framework of distributional semantics, for the exploration of semantic similarity in a technical corpus. In complement to a previous quantitative semantic analysis conducted in the same domain of machining terminology, this paper sets out to discover finegrained semantic distinctions in an attempt to explore the semantic heterogeneity of a number of technical items. Multidimensional scaling analysis (MDS) was carried out in order to cluster first-order co-occurrences of a technical node with respect to shared second-order and third-order co-occurrences. By taking into account the association values between relevant first and second-order co-occurrences, semantic similarities and dissimilarities between first-order co-occurrences could be determined, as well as proximities and distances on a graph. In our discussion of the methodology and results of statistical clustering techniques for semantic purposes, we pay special attention to the linguistic and terminological interpretation.

[1]  Laura Daniela Ferrari Un caso de polisemia en el discurso jurídico , 2002 .

[2]  Eric Villemonte de la Clergerie,et al.  Vers un environnement de production et de validation de ressources lexicales sémantiques , 2013 .

[3]  Mark J. van der Laan,et al.  A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap , 2003 .

[4]  Klaus-Dirk Schmitz,et al.  Einführung in die Terminologiearbeit , 2014 .

[5]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[6]  Ann Bertels,et al.  La polysémie du vocabulaire technique. Une étude quantitative. , 2006 .

[7]  Anke Lüdeling,et al.  Corpus Linguistics: An International Handbook , 2009 .

[8]  Rita Temmerman,et al.  Towards New Ways of Terminology Description: The Sociocognitive-Approach , 2000 .

[9]  Rogelio Nazar,et al.  Automatic taxonomy extraction for specialized domains using distributional semantics , 2012 .

[10]  Kris Heylen,et al.  Looking at word meaning. An interactive visualization of Semantic Vector Spaces for Dutch synsets , 2012, EACL 2012.

[11]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[12]  M. Teresa Cabre,et al.  Terminologie et linguistique: la théorie des portes , 2000 .

[13]  Eugen Wüster Internationale Sprachnormung in der Technik : besonders in der Elektrotechnik (die nationale Sprachnormung und ihre Verallgemeinerung) , 1931 .

[14]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[15]  Anne Condamines,et al.  Point de vue en langue spécialisée , 1997 .

[16]  R. Harald Baayen,et al.  Analyzing linguistic data: a practical introduction to statistics using R, 1st Edition , 2008 .

[17]  François Morlane-Hondère Utiliser une base distributionnelle pour filtrer un dictionnaire de synonymes , 2013 .

[18]  Christian Biemann,et al.  Automatic Acquisition of Paradigmatic Relations Using Iterated Co-occurrences , 2004, LREC.

[19]  Benoît Lemaire,et al.  Effects of High-Order Co-occurrences on Word Semantic Similarities , 2006, ArXiv.

[20]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[21]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[22]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[23]  Pamela Faber,et al.  A Cognitive Linguistics View of Terminology and Specialized Language , 2012 .

[24]  Iéda Maria Alves,et al.  Socioterminologie. Une approche sociolinguistique de la terminologie , 2003 .

[25]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[26]  Ann Bertels,et al.  La corrélation entre la spécificité et la sémantique dans un corpus spécialisé , 2010 .

[27]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[28]  Yves Peirsman,et al.  Predicting Strong Associations on the Basis of Corpus Data , 2009, EACL.

[29]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[30]  Ann Bertels,et al.  Exploration sémantique visuelle à partir des cooccurrences de deuxième et troisième ordre , 2013 .

[31]  Ann Bertels,et al.  La contribution des cooccurrences de deuxième ordre à l’analyse sémantique , 2012 .

[32]  Olivier Ferret Similarité sémantique et extraction de synonymes à partir de corpus , 2010 .

[33]  Dirk Geeraerts,et al.  Theories of Lexical Semantics , 2010 .

[34]  Daoud Clarke,et al.  A Context-Theoretic Framework for Compositionality in Distributional Semantics , 2011, Computational Linguistics.

[35]  Gregory Grefenstette,et al.  Corpus-Derived First, Second and Third-Order Word Affinities , 1994 .

[36]  Ann Bertels The dynamics of terms and meaning in the domain of machining terminology , 2011 .

[37]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.