Using single layer networks for discrete, sequential data: An example from Natural Language Processing

Natural Language Processing (NLP) is concerned with processing ordinary, unrestricted text. This work takes a new approach to a traditional NLP task, using neural computing methods. A parser which has been successfully implemented is described. It is a hybrid system, in which neural processors operate within a rule based framework. The neural processing components belong to the class of Generalized Single Layer Networks (GSLN). In general, supervised, feed-forward networks need more than one layer to process data. However, in some cases data can be pre-processed with a non-linear transformation, and then presented in a linearly separable form for subsequent processing by a single layer net. Such networks offer advantages of functional transparency and operational speed. For our parser, the initial stage of processing maps linguistic data onto a higher order representation, which can then be analysed by a single layer network. This transformation is supported by information theoretic analysis. Three different algorithms for the neural component were investigated. Single layer nets can be trained by finding weight adjustments based on (a) factors proportional to the input, as in the Perceptron, (b) factors proportional to the existing weights, and (c) an error minimization method. In our experiments generalization ability varies little; method (b) is used for a prototype parser. This is available via telnet.

[1]  Y. Wilks,et al.  A General Architecture for Text Engineering (gate) { a New Approach to Language Engineering R&d a General Architecture for Text Engineering (gate) | a New Approach to Language Engineering R&d a E G T , 1995 .

[2]  Caroline Lyon,et al.  A fast partial parse of natural language sentences using a connectionist method , 1995, EACL.

[3]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[4]  Peter J. Wyard,et al.  A Single Layer Higher Order Neural Net and its Application to Context Free Grammar Recognition , 1990 .

[5]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[6]  Jack Sklansky,et al.  Pattern Classifiers and Trainable Machines , 1981 .

[7]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[8]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[9]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[10]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[13]  Eric Atwell,et al.  Constituent-Likelihood Grammar , 1983 .

[14]  Jeffrey Scott Vitter,et al.  Complexity Results on Learning by Neural Nets , 1991, Machine Learning.

[15]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[16]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[17]  L. Tarassenko,et al.  New method of automated sleep quantification , 1992, Medical and Biological Engineering and Computing.

[18]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[19]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[20]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[21]  A. Lapedes,et al.  Application of neural networks and other machine learning algorithms to DNA sequence analysis , 1988 .

[22]  Caroline Lyon,et al.  The representation of natural language to enable neural networks to detect syntactic structures , 1994 .

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  S. Johansson,et al.  Frequency analysis of English vocabulary and grammar : based on the LOB Corpus , 1989 .

[25]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[26]  Michael Lesk,et al.  Review of The computational analysis of English: a corpus-based approach by Roger Garside, Geoffrey Leech, and Geoffrey Sampson. Longman 1987. , 1988 .

[27]  Peter J. W. Rayner,et al.  Generalization and PAC learning: some new results for the class of generalized single-layer networks , 1995, IEEE Trans. Neural Networks.

[28]  J. Slawny,et al.  Back propagation fails to separate where perceptrons succeed , 1989 .

[29]  Mahesan Niranjan,et al.  On the Practical Applicability of VC Dimension Bounds , 1995, Neural Computation.

[30]  Eric Brill,et al.  Deducing Linguistic Structure from the Statistics of Large Corpora , 1990, HLT.

[31]  Z. Wang,et al.  On Solving the Inverse Scattering Problem with RBF Neural Networks: Noise-Free Case , 1999, Neural Computing & Applications.

[32]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[33]  Douglas Biber,et al.  Frequency Analysis of English Vocabulary and Grammar , 1991 .

[34]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[35]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[36]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[37]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[38]  Eduardo D. Sontag,et al.  Backpropagation separates when perceptrons do , 1989, International 1989 Joint Conference on Neural Networks.

[39]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.