Extracting Phonetic Knowledge from Learning Systems: Perceptrons, Support Vector Machines and Linear Discriminants

Speech perception relies on the human ability to decode continuous, analogue sound pressure waves into discrete, symbolic labels (‘phonemes’) with linguistic meaning. Aspects of this signal-to-symbol transformation have been intensively studied over many decades, using psychophysical procedures. The perception of (synthetic) syllable-initial stop consonants has been especially well studied, since these sounds display a marked categorization effect: they are typically dichotomised into ‘voiced’ and ‘unvoiced’ classes according to their voice onset time (VOT). In this case, the category boundary is found to have a systematic relation to the (simulated) place of articulation, but there is no currently-accepted explanation of this phenomenon. Categorization effects have now been demonstrated in a variety of animal species as well as humans, indicating that their origins lie in general auditory and/or learning mechanisms, rather than in some ‘phonetic module’ specialized to human speech processing.In recent work, we have demonstrated that appropriately-trained computational learning systems (‘neural networks’) also display the same systematic behaviour as human and animal listeners. Networks are trained on simulated patterns of auditory-nerve firings in response to synthetic ‘continuua’ of stop-consonant/vowel syllables varying in place of articulation and VOT. Unlike real listeners, such a software model is amenable to analysis aimed at extracting the phonetic knowledge acquired in training, so providing a putative explanation of the categorization phenomenon. Here, we study three learning systems: single-layer perceptrons, support vector machines and Fisher linear discriminants. We highlight similarities and differences between these approaches. We find that the modern inductive inference technique for small sample sizes of support vector machines gives the most convincing results. Knowledge extracted from the trained machine indicated that the phonetic percept of voicing is easily and directly recoverable from auditory (but not acoustic) representations.

[1]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[2]  A. Liberman,et al.  Some Cues for the Distinction Between Voiced and Voiceless Stops in Initial Position , 1957 .

[3]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[4]  N. K. Bose,et al.  Neural Network Fundamentals with Graphs, Algorithms and Applications , 1995 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  René Dirven,et al.  A first dictionary of linguistics and phonetics , 1982 .

[7]  J. Perkell,et al.  Invariance and variability in speech processes , 1987 .

[8]  D. Zipser,et al.  Identification models of the nervous system , 1992, Neuroscience.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Robert I. Damper,et al.  Representation of initial stop consonants in a computational model of the dorsal cochlear nucleus , 1990 .

[11]  F. Crick The Astonishing Hypothesis , 1994 .

[12]  Dennis Sanger,et al.  Contribution analysis: a technique for assigning responsibilities to hidden units in connectionist networks , 1991 .

[13]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[14]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[15]  C. W. Groetsch,et al.  Regularization of Ill-Posed Problems. , 1978 .

[16]  J D Miller,et al.  Speech perception by the chinchilla: identification function for synthetic VOT stimuli. , 1978, The Journal of the Acoustical Society of America.

[17]  Robert I. Damper,et al.  Connectionist models of categorical perception of speech , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[18]  James A. Anderson,et al.  Cognitive and psychological computation with neural models , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[19]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[20]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Robert I. Damper Auditory representations of speech sounds in a neural model: the role of peripheral processing , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[25]  S. Zahorian,et al.  Dynamic spectral shape features as acoustic correlates for initial stop consonants , 1991 .

[26]  D. D. Greenwood A cochlear frequency-position function for several species--29 years later. , 1990, The Journal of the Acoustical Society of America.

[27]  Robert I. Damper A biocybernetic simulation of speech perception by humans and animals , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[28]  Terence D. Sanger,et al.  An Optimality Principle for Unsupervised Learning , 1988, NIPS.

[29]  Patricia K. Kuhl,et al.  The special-mechanisms debate in speech research: Categorization tests on animals and infants. , 1987 .

[30]  Andy Clark,et al.  Connectionism, Competence, and Explanation , 1990, The British Journal for the Philosophy of Science.

[31]  R I Damper,et al.  A computational model of afferent neural activity from the cochlea to the dorsal acoustic stria. , 1991, The Journal of the Acoustical Society of America.

[32]  Stuart Rosen,et al.  Phonetic classification of the plosive voicing contrast using computational modelling , 1992 .

[33]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[34]  S. Grossberg,et al.  Neural network models of categorical perception , 2000, Perception & psychophysics.

[35]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[36]  S D Soli,et al.  The role of spectral cues in discrimination of voice onset time differences. , 1983, The Journal of the Acoustical Society of America.

[37]  C. C. Wood Discriminability, response bias, and phoneme categories in discrimination of voice onset time. , 1976, The Journal of the Acoustical Society of America.

[38]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[39]  Stephen A. Ritz,et al.  Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .

[40]  Q Summerfield,et al.  Differences between spectral dependencies in auditory and phonetic temporal processing: Relevance to the perception of voicing in initial stops. , 1982, The Journal of the Acoustical Society of America.

[41]  D. D. Greenwood Critical Bandwidth and the Frequency Coordinates of the Basilar Membrane , 1961 .

[42]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[43]  Stuart Rosen,et al.  Auditory, Articulatory and Learned Factors in Categorical Perception , 1987 .

[44]  S. Harnad Categorical Perception: The Groundwork of Cognition , 1990 .

[45]  C. Mészáros The BPMPD interior point solver for convex quadratic problems , 1999 .

[46]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[47]  A. N. Tikhonov,et al.  The regularization of ill-posed problems , 1963 .