Contextual self-organizing map: software for constructing semantic representations

In this article, we introduce a software package that applies a corpus-based algorithm to derive semantic representations of words. The algorithm relies on analyses of contextual information extracted from a text corpus—specifically, analyses of word co-occurrences in a large-scale electronic database of text. Here, a target word is represented as the combination of the average of all words preceding the target and all words following it in a text corpus. The semantic representation of the target words can be further processed by a self-organizing map (SOM; Kohonen, Self-organizing maps,2001), an unsupervised neural network model that provides efficient data extraction and representation. Due to its topography-preserving features, the SOM projects the statistical structure of the context onto a 2-D space, such that words with similar meanings cluster together, forming groups that correspond to lexically meaningful categories. Such a representation system has its applications in a variety of contexts, including computational modeling of language acquisition and processing. In this report, we present specific examples from two languages (English and Chinese) to demonstrate how the method is applied to extract the semantic representations of words.

[1]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  M. Maratsos,et al.  The internal language of children's syntax : The ontogenesis and representation of syntactic categories , 1980 .

[4]  F. D. Saussure,et al.  Course in general linguistics : Cours de linguistique générale(Orig.) , 1966 .

[5]  Eve V. Clark,et al.  The proceedings of the thirtieth annual : Child Language Research Forum , 2000 .

[6]  Ping Li,et al.  The Acquisition of Chinese Characters : Corpus Analyses and Connectionist Simulations , 2006 .

[7]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[8]  Jeanette Altarriba,et al.  Bilingual sentence processing , 2002 .

[9]  Patrick Bonin,et al.  Mental lexicon : "some words to talk about words" , 2004 .

[10]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[11]  David G. Stork,et al.  Pattern Classification , 1973 .

[12]  Ping Li,et al.  Dynamic Self-Organization and Early Lexical Development in Children , 2007, Cogn. Sci..

[13]  Michael N Jones,et al.  Representing word meaning and order information in a composite holographic lexicon. , 2007, Psychological review.

[14]  Catherine E. Snow,et al.  Children's language , 1990 .

[15]  Xiaowei Zhao,et al.  Bilingual Lexical Representation in a Self-Organizing Neural Network Model , 2007 .

[16]  Ping Li,et al.  3 A self-organizing connectionist model of bilingual processing , 2002 .

[17]  K. Plunkett,et al.  A neurocomputational account of taxonomic responding and fast mapping in early word learning. , 2010, Psychological review.

[18]  Ping Li,et al.  An online database of phonological representations for Mandarin Chinese , 2009, Behavior research methods.

[19]  Ping Li,et al.  Competition : A Connectionist Model of the Learning of English Reversive Prefixes , 2010 .

[20]  F. Saussure,et al.  Course in General Linguistics , 1960 .

[21]  Richard C. Anderson,et al.  Properties of school Chinese: implications for learning to read. , 2003, Child development.

[22]  Ping Li,et al.  Lexical Organization and Competition in First and Second Languages: Computational and Neural Mechanisms , 2009, Cogn. Sci..

[23]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[24]  Robert M. French,et al.  The BIA++: Extending the BIA+ to a dynamical distributed connectionist framework , 2002, Bilingualism: Language and Cognition.

[25]  M. Ross Quillian,et al.  Retrieval time from semantic memory , 1969 .

[26]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[27]  Ping Li,et al.  Early lexical development in a self-organizing neural network , 2004, Neural Networks.

[28]  Timo Honkela,et al.  Contextual Relations of Words in Grimm Tales, Analyzed by Self-Organizing Map , 1995 .

[29]  Curt Burgess,et al.  Modelling Parsing Constraints with High-dimensional Context Space , 1997 .

[30]  Jeffrey L. Elman,et al.  Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations , 1997 .

[31]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[32]  Michael N. Jones,et al.  Redundancy in Perceptual and Linguistic Experience: Comparing Feature-Based and Distributional Models of Semantic Representation , 2010, Top. Cogn. Sci..

[33]  R. Miikkulainen Dyslexic and Category-Specific Aphasic Impairments in a Self-Organizing Feature Map Model of the Lexicon , 1997, Brain and Language.

[34]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[35]  Ping Li,et al.  The Acquisition of Word Meaning through Global Lexical Co-occurrences , 2000 .

[36]  J. Altarriba,et al.  A Self-Organizing Connectionist Model of Bilingual Processing , 2001 .

[37]  Ping Li,et al.  PatPho: A phonological pattern generator for neural networks , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[38]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[39]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .