A Statistically Emergent Approach for Language Processing: Application to Modeling Context Effects in Ambiguous Chinese Word Boundary Perception

This paper proposes that the process of language understanding can be modeled as a collective phenomenon that emerges from a myriad of microscopic and diverse activities. The process is analogous to the crystallization process in chemistry. The essential features of this model are: asynchronous parallelism; temperature-controlled randomness; and statistically emergent active symbols. A computer program that tests this model on the task of capturing the effect of context on the perception of ambiguous word boundaries in Chinese sentences is presented. The program adopts a holistic approach in which word identification forms an integral component of sentence analysis. Various types of knowledge, from statistics to linguistics, are seamlessly integrated for the tasks of word boundary disambiguation as well as sentential analysis. Our experimental results showed that the model is able to address the word boundary ambiguity problems effectively.

[1]  Douglas R. Hofstadter,et al.  The architecture of Jumbo , 1995 .

[2]  Keh-Yih Su,et al.  Corpus-based Automatic Compound Extraction with Mutual Information and Relative Frequency Count , 1993, ROCLING/IJCLCLP.

[3]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[4]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[5]  Alain Polguère,et al.  Meaning-Text Semantic Networks as a Formal Language , 1997 .

[6]  Kok-Wee Gan,et al.  Integrating Word Boundary Identification with Sentence Understanding , 1993, ACL.

[7]  Mill Johannes G.A. Van,et al.  Transmission Of Information , 1961 .

[8]  Richard Sproat,et al.  A statistical method for finding word boundaries in Chinese text , 1990 .

[9]  Chilin Shih,et al.  A Stochastic Finite-State Word-Segmentation Algorithm for Chinese , 1994, ACL.

[10]  Marsha J. Ekstrom Meredith,et al.  Seek-Whence: A Model of Pattern Perception , 1986 .

[11]  Keh-Yih Su,et al.  Statistical Models for Word Segmentation And Unknown Word Resolution , 1992, ROCLING.

[12]  Charles N. Li,et al.  Mandarin Chinese: A Functional Reference Grammar , 1989 .

[13]  Robert M. French Tabletop: an emergent, stochastic computer model of analogy-making , 1992 .

[14]  Melanie Mitchell Copycat: a computer model of high-level perception and conceptual slippage in analogy-making , 1992 .

[15]  Donald D. Hoffman,et al.  Parts of recognition , 1984, Cognition.

[16]  Geoffrey Sampson,et al.  Natural language analysis by stochastic optimization: a progress report on Project APRIL , 1990, J. Exp. Theor. Artif. Intell..

[17]  Amiel Feinstein,et al.  Transmission of Information. , 1962 .

[18]  Keh-Jiann Chen,et al.  Word Identification for Mandarin Chinese Sentences , 1992, COLING.