A connectionist approach to word sense disambiguation

The title of Cottrell's book mentions only two concepts: connectionism and lexical disambiguation. That's misleading, because the book has much more to offer than just that. Among the topics addressed are parsing, agrammatism, connectionist inheritance hierarchies, and structural ambiguity, and it is the integration of this wide-ranging set of topics that is one of the strengths of the work. The book appears four years after the 1985 University of Rochester dissertation upon which it is based. Thus the flavor of connectionism that Cottrell uses is the coarsegrained localist representations used at Rochester in the early 1980s, in which each node in the network represents a concept. This is in contrast to the distributed representations ("PDP") that became popular in the latter part of the decade, in which many nodes may contribute to the representation of a concept (Rumelhart and McClelland 1986). Cottrell has taken advantage of the delay in publication to restructure the work substantially and to add discussions of the later research. He seems to suggest (p. 7) that distributed representations are generally preferable because they can learn, whereas localist networks like his own need to be individually hand designed. Nevertheless, this research shows that there is considerable appeal in hand-designed, localist networks. Cottrell takes work in psycholinguistics as the starting point for his model of lexical access and disambiguation. In the early 1980s, it was discovered that in many circumstances, people subconsciously consider all meanings of an ambiguous word, even if the preceding context makes one alternative preferable a priori. For example, the floral sense of the word rose is activated even when one hears The congregation rose. Within a few hundred milliseconds all senses but the one chosen as correct become deactivated again. (While subsequent research has qualified these resuits somewhat--see Gorfein 1989--the basic principle has proven to be robust.) The usual explanation for these results is in terms of priming and spreading activation in a semantic network, so a localist model is very natural. The input to Cottrelrs networks is a string of words forming a syntactically simple sentence, such as Bob threw a ball to the dog. This is done by activating the nodes corresponding to the words. The activation of a node causes the activation of those other nodes in the network to which it is connected by excitory links and the deactivation of those to which it is connected by inhibitory links. A node can receive activation and inhibition at the same time; for example, an ambiguous word will send activation to all its senses, but the senses will be mutally inhibitory. Thus the network may be unstable for some time until it settles down into a pattern of activation that represents its "output"; the nodes representing the relevant concepts are activated and other nodes aren't. In the case of an ambiguous word, the correct meaning in context will presumably receive activation from more sources, or be pre-activated by the preceding context, and thus be able eventually to force its competitors into inhibition. This final pattern of activation may be construed as the interpretation of the sentence. After the word-sense selection network, there are two more networks, running in parallel with one another: one for determining case roles and one for syntactic analysis. The case role network uses an "exploded" notion of cases; that is, rather than having one node representing, say, the agent role, Cottrell has one node for the agent of a propel action, one for the agent of a vomit action, and so on. (The topic area of Cottrell's example sentences ranges from baseball to emesis.) This seems counter-intuitive, or unparsimonious at the very least; but I must admit that, modern linguistic theory notwithstanding, I know of no particular psycholinguistic evidence for the reality of a single concept of, say, agency that is activated for any and every sentence that involves an agent. A feature of the parsing network is that it need not be constructed by hand; rather, it is automatically generated from a grammar and lexicon by a Lisp program. It parses only the very simple one-clause sentences needed to test the other parts of the system. Unlike the other parts of the system, the parser has no special claim to psychological reality. However, the minimal-attachment strategy of structural ambiguity resolution (namely, to attach a new constituent in the way that creates the fewest new nodes) "falls out" as a natural consequence of the design. Cottrell includes an interesting discussion of his system's predictions for aphasia. If the system has some psychological reality, then one would expect that "damage" to the network would result in behavior similar to that of aphasic patients. For example, if the connection between the case

[1]  Ira Fischler,et al.  Associative facilitation without expectancy in a lexical decision task. , 1977 .

[2]  Geoffrey E. Hinton,et al.  Massively Parallel Architectures for AI: NETL, Thistle, and Boltzmann Machines , 1983, AAAI.

[3]  E. Saffran,et al.  Neuropsychological approaches to the study of language. , 1982, British journal of psychology.

[4]  Lyn Frazier,et al.  The interaction of syntax and semantics during sentence processing: eye movements in the analysis of semantically biased sentences , 1983 .

[5]  Alfonso Caramazza,et al.  A redefinition of the syndrome of Broca's aphasia: Implications for a neuropsychological model of language , 1980, Applied Psycholinguistics.

[6]  Raymond Reiter,et al.  On Inheritance Hierarchies With Exceptions , 1983, AAAI.

[7]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[8]  I. Fischler Semantic facilitation without association in a lexical decision task , 1977, Memory & cognition.

[9]  D. Swinney,et al.  Effects of prior context upon lexical access during sentence comprehension. , 1976 .

[10]  Marie Bienkowski,et al.  Automatic access of the meanings of ambiguous words in context: Some limitations of knowledge-based processing , 1982, Cognitive Psychology.

[11]  W Daniel Hillis,et al.  The Connection Machine (Computer Architecture for the New Wave). , 1981 .

[12]  A. D. Groot The range of automatic spreading activation in word priming , 1983 .

[13]  Graeme Hirst,et al.  Word Sense and Case Slot Disambiguation , 1982, AAAI.

[14]  Saul Sternberg,et al.  The discovery of processing stages: Extensions of Donders' method , 1969 .

[15]  David S. Touretzky,et al.  The Mathematics of Inheritance Systems , 1984 .

[16]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[17]  Lyn Frazier,et al.  ON COMPREHENDING SENTENCES: SYNTACTIC PARSING STRATEGIES. , 1979 .

[18]  M. Baltin,et al.  The Mental representation of grammatical relations , 1985 .

[19]  Alan S. Brown,et al.  Information Processing and Cognition: The Loyola Symposium , 1976 .

[20]  Jordan Pollack,et al.  Natural Language Processing Using Spreading Activation and Lateral Inhibition. , 1982 .

[21]  H. Kolk,et al.  Judgement of sentence structure in Broca's aphasia , 1978, Neuropsychologia.

[22]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[23]  M. Ross Quillian,et al.  The teachable language comprehender: a simulation program and theory of language , 1969, CACM.

[24]  Harold Goodglass,et al.  The retrieval of syntax in Broca's aphasia , 1975, Brain and Language.

[25]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[26]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[27]  Raymond Reiter,et al.  A Logic for Default Reasoning , 1987, Artif. Intell..

[28]  Terry Winograd,et al.  Language as a Cognitive Process , 1983, CL.

[29]  M. Posner,et al.  Attention and cognitive control. , 1975 .

[30]  Elaine Rich,et al.  Default Reasoning as Likelihood Reasoning , 1983, AAAI.

[31]  K. Stanovich,et al.  On priming by a sentence context. , 1983, Journal of experimental psychology. General.

[32]  Dedre Gentner,et al.  Some interesting differences between nouns and verbs , 1981 .

[33]  Janet D. Fodor,et al.  The sausage machine: A new two-stage parsing model , 1978, Cognition.

[34]  Donald A. Norman,et al.  Simulating a Skilled Typist: A Study of Skilled Cognitive-Motor Performance , 1982, Cogn. Sci..

[35]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: Part 2. The contextual enhancement effect and some tests and extensions of the model. , 1982, Psychological review.

[36]  Jean E. Newman,et al.  The phonological nature of phoneme monitoring: A critique of some ambiguity studies , 1978 .

[37]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[38]  Mark S. Seidenberg,et al.  When does irregular spelling or pronunciation influence word recognition , 1984 .

[39]  G S Dell,et al.  A spreading-activation theory of retrieval in sentence production. , 1986, Psychological review.

[40]  Harold Goodglass,et al.  Semantic field, naming, and auditory comprehension in aphasia , 1976, Brain and Language.

[41]  J. Yates,et al.  Priming dominant and unusual senses of ambiguous words , 1978 .

[42]  Arthur L. Blumenthal,et al.  Observations with self-embedded sentences , 1966 .

[43]  John McCarthy,et al.  Applications of Circumscription to Formalizing Common Sense Knowledge , 1987, NMR.

[44]  R. E. Warren,et al.  Stimulus encoding and memory. , 1972 .

[45]  D. Gorfein Resolving Semantic Ambiguity , 1989, Cognitive Science.

[46]  Garrison W. Cottrell Re: Inheritance Hierarchies with Exceptions , 1984, NMR.

[47]  D. Swinney,et al.  Accessing lexical ambiguities during sentence comprehension: Effects of frequency of meaning and contextual bias , 1981 .

[48]  Robert William Milne,et al.  Predicting Garden Path Sentences , 1982, Cogn. Sci..

[49]  Daniel Sabbah,et al.  Computing with Connections in Visual Recognition of Origami Objects , 1988, Cogn. Sci..

[50]  James R. Lackner,et al.  Resolving ambiguity: Effects of biasing context in the unattended ear , 1972 .

[51]  W. Marslen-Wilson,et al.  The temporal structure of spoken language understanding , 1980, Cognition.

[52]  Timothy W. Finin,et al.  The semantic interpretation of compound nominals , 1980 .

[53]  MilneRobert,et al.  Resolving Lexical Ambiguity in a Deterministc Parser , 1986, Comput. Linguistics.

[54]  R. Schvaneveldt,et al.  Facilitation in recognizing pairs of words: evidence of a dependence between retrieval operations. , 1971, Journal of experimental psychology.

[55]  Robert Schreuder,et al.  Effects of perceptual and conceptual similarity in semantic priming , 1984 .

[56]  Terrence J. Sejnowski,et al.  LEARNING SEMANTIC FEATURES , 1984 .

[57]  Christopher K. Riesbeck,et al.  Computational understanding : analysis of sentences and context , 1974 .

[58]  Myrna F. Schwartz,et al.  Sensitivity to grammatical structure in so-called agrammatic aphasics , 1983, Cognition.

[59]  D. Swinney Lexical access during sentence comprehension: (Re)consideration of context effects , 1979 .

[60]  David W. Etherington Formalizing Non-Monotonic Reasoning Systems , 1983 .

[61]  Myrna F Schwartz,et al.  The word order problem in agrammatism II. Production , 1980, Brain and Language.

[62]  Helen Mueller Gigley Neurolinguistically constrained simulation of sentence comprehension: integrating artificial intelligence and brain theory , 1982 .

[63]  Christopher K. Riesbeck,et al.  Comprehension by computer : expectation-based analysis of sentences in context , 1976 .

[64]  Victor R. Lesser,et al.  A Retrospective View of the Hearsay-II Architecture , 1977, IJCAI.

[65]  James L. McClelland Putting Knowledge in its Place: A Scheme for Programming Parallel Processing Structures on the Fly , 1988, Cogn. Sci..

[66]  Carol Conrad,et al.  Context effects in sentence comprehension: A study of the subjective lexicon , 1974, Memory & cognition.

[67]  Daniel Sabbah,et al.  A connectionist approach to visual recognition , 1982 .

[68]  Jordan B. Pollack,et al.  Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation , 1988, Cogn. Sci..

[69]  M. Schwartz,et al.  The word order problem in agrammatism I. Comprehension , 1980, Brain and Language.

[70]  Mihai Nadin T. Winograd, Language as a Cognitive Process, Volume I: Syntax , 1985, Artif. Intell..

[71]  Roger C. Schank,et al.  Conceptual dependency: A theory of natural language understanding , 1972 .

[72]  E. Rosch Cognitive Representations of Semantic Categories. , 1975 .

[73]  Garrison W. Cottrell,et al.  A Model of Lexical Access of Ambiguous Words , 1984, AAAI.

[74]  M. F. Garrett,et al.  Word and Sentence Perception , 1978 .

[75]  David W. Etherington,et al.  Finite default theories , 1982 .

[76]  A. Caramazza,et al.  Semantic feature representations for normal and aphasic language , 1974 .

[77]  Walter Anthony Cook,et al.  Case Grammar: Development of the Matrix Model (1970-1978) , 1981 .

[78]  Jerome A. Feldman,et al.  Connectionist Models and Their Properties , 1982, Cogn. Sci..

[79]  David S. Touretzky,et al.  Cancellation in a Parallel Semantic Network , 1981, IJCAI.

[80]  R. E. Warren,et al.  Time and the spread of activation in memory. , 1977 .

[81]  John J. L. Morton,et al.  Interaction of information in word recognition. , 1969 .

[82]  Azriel Rosenfeld,et al.  Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[83]  Chuck Rieger,et al.  Parsing and comprehending with word experts (a theory and its realization) , 1982 .

[84]  V. M. Holmes,et al.  Prior context and the perception of lexically ambiguous sentences , 1977, Memory & cognition.

[85]  D. J. Foss,et al.  Some effects of context on the comprehension of ambiguous sentences , 1973 .

[86]  Lokendra Shastri,et al.  Semantic Networks and Neural Nets , 1984 .

[87]  Scott E. Fahlman,et al.  The hashnet interconnection scheme , 1980 .

[88]  C. A. Becker Semantic context effects in visual word recognition: An analysis of semantic strategies , 1980, Memory & cognition.

[89]  Keith E. Stanovich,et al.  Source of Inhibition in Experiments on the Effect of Sentence Context on Word Recognition. , 1982 .

[90]  E. Rosch ON THE INTERNAL STRUCTURE OF PERCEPTUAL AND SEMANTIC CATEGORIES1 , 1973 .

[91]  Matthew L. Ginsberg,et al.  Non-Monotonic Reasoning Using Dempster's Rule , 1984, AAAI.

[92]  Graeme Hirst,et al.  Semantic interpretation against ambiguity , 1984 .

[93]  Bertram C. Bruce Case Systems for Natural Language , 1975, Artif. Intell..

[94]  A. Koriat,et al.  Semantic facilitation in lexical decision as a function of prime-target association , 1981, Memory & cognition.

[95]  N. Geschwind Language and the brain. , 1972, Scientific American.

[96]  J. H. Neely Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. , 1977 .

[97]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[98]  A. Caramazza,et al.  Dissociation of algorithmic and heuristic processes in language comprehension: Evidence from aphasia , 1976, Brain and Language.

[99]  Eugene Charniak,et al.  A Common Representation for Problem-Solving and Language-Comprehension Information , 1981, Artif. Intell..

[100]  Steven Lawrence Small,et al.  Word expert parsing: a theory of distributed word-based natural language understanding , 1980 .

[101]  Mark S. Seidenberg,et al.  Pre- and postlexical loci of contextual effects on word recognition , 1984, Memory & cognition.

[102]  D. Swinney,et al.  Semantic facilitation across sensory modalities in the processing of individual words and sentences , 1979, Memory & cognition.

[103]  Jeffrey L. Elman,et al.  Speech Perception as a Cognitive Process: The Interactive Activation Model. , 1983 .

[104]  Geoffrey E. Hinton Shape Representation in Parallel Systems , 1981, IJCAI.

[105]  Dana H. Ballard,et al.  Viewframes: A Connectionist Model of Form Perception, , 1983 .

[106]  Scott E. Fahlman,et al.  NETL: A System for Representing and Using Real-World Knowledge , 1979, CL.