An occurrence-based model of word categorization

Small corpora present problems for traditional statistical analysis because of their sparsity of data. We discuss a methodology for classifying words in edited, plain text corpora which has the potential for working on relatively small corpora. This approach, which we calloccurrence-based processing, counts which contexts occur around a given word, but pays no attention to the number of times that each context occurs. We obtain good results on an artificial language and compare our results to Elman's connectionist analysis of the same artificial language. We obtain more modest results on real world corpora, but the results are sufficient to draw some methodological and language theoretical conclusions.

[1]  Zellig S. Harris,et al.  A Grammar of English on Mathematical Principles , 1982 .

[2]  Z. Harris,et al.  Book Reviews: The Form of Information in Science: Analysis of an Immunology Sublanguage , 1989, CL.

[3]  Eric Brill,et al.  Tagging an Unfamiliar Text With Minimal Human Supervision , 1992 .

[4]  Andrew Radford,et al.  Transformational Grammar: Contents , 1988 .

[5]  C. L. Baker,et al.  The Logical problem of language acquisition , 1984 .

[6]  J. Elman Representation and structure in connectionist models , 1991 .

[7]  Andrew Radford,et al.  Transformational Grammar: A First Course , 1988 .

[8]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[9]  S. Pinker Learnability and Cognition: The Acquisition of Argument Structure , 1989 .

[10]  Noam Chomsky Lectures on Government and Binding: The Pisa Lectures , 1993 .

[11]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12]  HirschmanLynette,et al.  Discovery procedures for sublanguage selectional patterns , 1986 .

[13]  Ralph Grishman,et al.  Discovery Procedures for Sublanguage Selectional Patterns: Initial Experiments , 1986, Comput. Linguistics.

[14]  Alfred V. Aho,et al.  Data Structures and Algorithms , 1983 .

[15]  T. Givón,et al.  Syntax: A Functional-Typological Introduction , 1982 .

[16]  Steven Pinker,et al.  Language learnability and language development , 1985 .

[17]  John R. Taylor,et al.  语言的范畴化:语言学理论中的类典型 = Linguistic categorization : prototypes in linguistic theory , 1989 .

[18]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .