SEMANTIC CLUSTERING OF VERBS Analysis of Morphosyntactic Contexts Using the SOM Algorithm

Obtaining semantic or functional word categories from data in an unsupervised manner is a problem motivated both from the linguistic point of view and from that of construing language models for various language processing tasks. In this work, we use the self-organizing map algorithm to visualize and cluster common Finnish verbs based on functional and semantic information coded by case marking and function words like postpositions and adverbs. Firstly, based on a data set of over 500,000 utterances of 25 verbs, we studied (a) the base forms and (b) the most common word forms of the same verbs (4764 forms). Secondly, the first experiment was repeated on a set of 600 verbs. The results show that even the simple feature selection used in this experiment was found to be suitable for rough automatic categorization of verbs on the basis of data extracted from unrestricted texts. In particular, the results demonstrate the importance of cultural, social and emotional dimensions in lexical

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  M. Stubbs Text and Corpus Analysis: Computer-Assisted Studies of Language and Culture , 1996 .

[3]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[4]  Michael Tomasello The New Psychology of Language , 1998 .

[5]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[6]  Nick Chater,et al.  Distributional Information: A Powerful Cue for Acquiring Syntactic Categories , 1998, Cogn. Sci..

[7]  S. Finch,et al.  Unsupervised methods for finding linguistic categories , 1992 .

[8]  Timo Honkela,et al.  Contextual Relations of Words in Grimm Tales, Analyzed by Self-Organizing Map , 1995 .

[9]  Jakub Zavrel,et al.  The Language Environment and Syntactic Word-Class Acquisition. , 1996 .

[10]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[11]  Timo Honkela,et al.  Self-Organizing Maps In Natural Language Processing , 1997 .

[12]  W. Foley Anthropological Linguistics: An Introduction , 1997 .

[13]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[14]  Maria Vilkuna,et al.  Free word order in Finnish , 1989 .

[15]  Esa Alhoniemi,et al.  SOM Toolbox for Matlab 5 , 2000 .

[16]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[17]  Elzbieta Tabakowska,et al.  Speaking of emotions : conceptualisation and expression , 1998 .

[18]  Jorma Laaksonen,et al.  SOM_PAK: The Self-Organizing Map Program Package , 1996 .

[19]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[20]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[21]  Thomas Givon Syntax: An Introduction. Volume I , 2001 .

[22]  R. L. Trask A Dictionary of Grammatical Terms in Linguistics , 1993 .

[23]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[24]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[25]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[26]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[27]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[28]  Maria Vilkuna,et al.  Free word order in Finnish : its syntax and discourse functions , 1991 .

[29]  Raija Hurme,et al.  Uusi suomi-englanti suursanakirja = Finnish-English general dictionary , 1984 .