Ways of trying in Russian: clustering behavioral profiles

Abstract This article proposes a methodology for addressing three long-standing problems of near synonym research. First, we show how the internal structure of a group of near synonyms can be revealed. Second, we deal with the problem of distinguishing the subclusters and the words in those subclusters from each other. Finally, we illustrate how these results identify the semantic properties that should be mentioned in lexicographic entries. We illustrate our methodology with a case study on nine near synonymous Russian verbs that, in combination with an infinitive, express TRY. Our approach is corpus-linguistic and quantitative: assuming a strong correlation between semantic and distributional properties, we analyze 1,585 occurrences of these verbs taken from the Amsterdam Corpus and the Russian National Corpus, supplemented where necessary with data from the Web. We code each particular instance in terms of 87 variables (a.k.a. ID tags), i. e., morphosyntactic, syntactic and semantic characteristics that form a verb's behavioral profile. The resulting co-occurrence table is evaluated by means of a hierarchical agglomerative cluster analysis and additional quantitative methods. The results show that this behavioral profile approach can be used (i) to elucidate the internal structure of the group of near synonymous verbs and present it as a radial network structured around a prototypical member and (ii) to make explicit the scales of variation along which the near synonymous verbs vary.

[1]  Beth Levin,et al.  Building on a corpus: A linguistic and lexicographical look at some near-synonyms* , 1995 .

[2]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[3]  Graeme Hirst,et al.  Near-synonymy and the structure of lexical knowledge , 1995 .

[4]  Anatol Stefanowitsch,et al.  Corpora in cognitive linguistics : corpus-based approaches to syntax and lexis , 2006 .

[5]  S. Gries,et al.  Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions , 2005 .

[6]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[7]  Yves Bernard,et al.  Толковый экономический и финансовый словарь : французская, немецкая, испанская терминология : в двух томах , 1994 .

[8]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[9]  B Dejonge The existence of synonyms in a language: two forms but one, or rather two, meanings? , 1993 .

[10]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[11]  Cynthia Fisher,et al.  On the semantic content of subcategorization frames , 1991, Cognitive Psychology.

[12]  Stefan Th. Gries,et al.  Corpus-based methods and cognitive semantics: The many senses of to run , 2005 .

[13]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[14]  Dagmar Divjak On trying in Russian: a tentative network model for near(er)-synonyms , 2003 .

[15]  Laura A. Janda,et al.  A metaphor in search of a source domain: The categories of Slavic aspect , 2004 .

[16]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[17]  Stefan Th. Gries,et al.  Ways of intending: Delineating and structuring near-synonyms , 2006 .

[18]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[19]  Zeno Vendler,et al.  Verbs and Times , 1957, The Language of Time - A Reader.

[20]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[21]  Patrick Hanks,et al.  Contextual dependency and lexical sets , 1996 .

[22]  Stefan Th. Gries,et al.  Collostructions: Investigating the interaction of words and constructions , 2003 .

[23]  John Haiman,et al.  Iconic and Economic Motivation , 1983 .

[24]  Sabine Schulte im Walde Clustering Verbs Semantically According to their Alternation Behaviour , 2000, COLING.

[25]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[26]  Martina Mangasser-Wahl,et al.  Prototypentheorie in der Linguistik : Anwendungsbeispiele-Methodenreflexion-Perspektiven , 2000 .

[27]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .

[28]  J. Taylor,et al.  Near synonyms as co-extensive categories: ‘high’ and ‘tall’ revisited , 2003 .

[29]  Hang Li,et al.  Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.

[30]  Scott A. McDonald Exploring the Validity of Corpus-derive d Measures of Semantic Similarity , 1997 .

[31]  Savas Tsohatsidis Meanings and prototypes : studies in linguistic categorization , 1990 .

[32]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[33]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[34]  John R. Taylor,et al.  On Lying in Russian. , 1992 .

[35]  John D. Bransford,et al.  The abstraction of linguistic ideas , 1971 .

[36]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[37]  D. Sandra,et al.  Network analyses of prepositional meaning: Mirroring whose mind—the linguist’s or the language user’s? , 1995 .

[38]  I︠u︡. D. Apresi︠a︡n Новый объяснительный словарь синонимов русского языка , 1997 .

[39]  Dagmar Divjak,et al.  Degrees of clause integration: from endotactic to exotactic subordination in Dutch , 2005 .

[40]  Anna Wierzbicka The semantics of grammar , 1988 .