Since the advent of Jordan's recurrent network [Jordan, M. I. (1986) Serial Order: A Parallel Distributed Processing Approach. Tech. Rep. No. 8604. Institute for Cognitive Science, University of California, San Diego.] which allows the processing of data with a temporal component, neural networks have been used routinely for sequence processing. This type of network is analysed in this paper for its ability to discriminate between different languages based on its processing of a small sample of text. The motivation for developing this model was for its potential use in the on-line version of a Trinity College 1872 Printed Catalogue, a library catalogue which has entries in 14 different languages spanning over 5 centuries. It was thought that neural networks would perform well where entries to be analysed comprised only a few words. The neural network's performance was compared with that of trigrams and a suffix/morphology analysis. The trigrams proved to be superior, classifying over 92% of the entries correctly compared to 88% for the neural network and 85% for the morphology/suffix analysis. Trigrams were also far superior in the speed at which statistics were compiled and the rate at which text was processed.
[1]
James L. McClelland,et al.
Learning Subsequential Structure in Simple Recurrent Networks
,
1988,
NIPS.
[2]
R. Palmer,et al.
Introduction to the theory of neural computation
,
1994,
The advanced book program.
[3]
A. Lawrence Spitz,et al.
Determination of the Script and Language Content of Document Images
,
1997,
IEEE Trans. Pattern Anal. Mach. Intell..
[4]
Seiichi Nakagawa,et al.
Diction for phoneme/syllable/word-category and identification of language using HMM
,
1990,
ICSLP.
[5]
A. House,et al.
Toward automatic identification of the language of an utterance. I. Preliminary methodological con
,
1977
.
[6]
Peter Fox.
Treasures of the Library, Trinity College, Dublin
,
1986
.
[7]
Jeffrey L. Elman,et al.
Finding Structure in Time
,
1990,
Cogn. Sci..
[8]
Geoffrey E. Hinton,et al.
Learning representations by back-propagating errors
,
1986,
Nature.
[9]
Michael I. Jordan.
Serial Order: A Parallel Distributed Processing Approach
,
1997
.
[10]
Morton David Rau.
Language Identification by Statistical Analysis
,
1974
.