Extracting Semantic Representations from Large Text Corpora

Many connectionist language processing models have now reached a level of detail at which more realistic representations of semantics are required. In this paper we discuss the extraction of semantic representations from the word co-occurrence statistics of large text corpora and present a preliminary investigation into the validation and optimisation of such representations. We find that there is significantly more variation across the extraction procedures and evaluation criteria than is commonly assumed.

[1]  John A. Bullinaria,et al.  Modelling Lexical Decision: Who needs a lexicon? , 1995 .

[2]  Malti Patel Using Neural Nets to Investigate Lexical Access , 1996, PRICAI.

[3]  T. Shallice,et al.  Deep Dyslexia: A Case Study of , 1993 .

[4]  J. Bullinaria Modeling Reading, Spelling, and Past Tense Learning with Artificial Neural Networks , 1997, Brain and Language.

[5]  Geoffrey E. Hinton,et al.  Lesioning an attractor network: investigations of acquired dyslexia , 1991 .

[6]  B. Fischhoff,et al.  Journal of Experimental Psychology: Human Learning and Memory , 1980 .

[7]  G. Leech 100 million words of English , 1993, English Today.

[8]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[9]  W. Marslen-Wilson,et al.  Accessing Different Types of Lexical Semantic Information: Evidence From Priming , 1995 .

[10]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[11]  W. Montague,et al.  Category norms of verbal items in 56 categories A replication and extension of the Connecticut category norms , 1969 .

[12]  James L. McClelland,et al.  A distributed, developmental model of word recognition and naming. , 1989, Psychological review.

[13]  John A. Bullinaria,et al.  Modelling Lexical Decision Using Corpus Derived Semantic Representations in a Connectionist Network , 1997, NCPW.

[14]  James L. McClelland,et al.  Understanding normal and impaired word reading: computational principles in quasi-regular domains. , 1996, Psychological review.

[15]  David C. Plaut,et al.  Semantic and Associative Priming in a Distributed Attractor Network , 1995 .

[16]  Paul W. B. Atkins,et al.  Models of reading aloud: Dual-route and parallel-distributed-processing approaches. , 1993 .

[17]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[18]  G. Miller,et al.  Semantic networks of english , 1991, Cognition.

[19]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[20]  Geoffrey Leech,et al.  100 Million Words of English:The British National Corpus (BNC) , 1992 .

[21]  John A. Bullinaria,et al.  Connectionist Models of Reading: Incorporating Semantics , 1996 .