Towards a unified theory of spoken language processing

Spoken language processing is arguably the most sophisticated behavior of the most complex organism in the known universe and, unsurprisingly, scientists still have much to learn about how it works. Meanwhile, automated spoken language processing systems have begun to emerge in commercial applications, not as a result of any deep insights into the way in which humans process language, but largely as a consequence of the introduction of a 'data-driven' approach to building practical systems. At the same time, computational models of human spoken language processing have begun to emerge but, although this has given rise to a greater interest in the relationship between human and machine behavior, the performance of the best models appears to be asymptoting some way short of the capabilities of the human listener/speaker. This paper discusses these issues, and presents an argument in favor of the derivation of a 'unifying theory' that would be capable of explaining and predicting both human and machine spoken language processing behavior, and hence serve both communities as well as representing a long-term 'grand challenge' for the scientific community in the emerging field of 'cognitive informatics'.

[1]  J. Hawkins,et al.  On Intelligence , 2004 .

[2]  E. Tulving Episodic memory: from mind to brain. , 2002, Annual review of psychology.

[3]  Roger K. Moore Speech Recognition Systems and Theories of Speech Perception , 1981 .

[4]  Lou Boves,et al.  Bridging automatic speech recognition and psycholinguistics: extending Shortlist to an end-to-end model of human speech recognition. , 2003, The Journal of the Acoustical Society of America.

[5]  M. Tomlinson,et al.  The discriminative network: A mechanism for focusing recognition in whole-word pattern matching , 1983, ICASSP.

[6]  Louis D. Braida,et al.  Human and machine consonant recognition , 2005, Speech Commun..

[7]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[8]  J. Fadili,et al.  The neural representation of nouns and verbs: PET studies. , 2001, Brain : a journal of neurology.

[9]  Louis ten Bosch,et al.  Modelling human speech recognition using automatic speech recognition paradigms in speM , 2003, INTERSPEECH.

[10]  Anne Cutler,et al.  Spoken word access processes: An introduction , 2001 .

[11]  Patrick Wambacq,et al.  Data driven example based continuous speech recognition , 2003, INTERSPEECH.

[12]  Roger K. Moore A multilevel approach to pattern processing , 1981, Pattern Recognit..

[13]  Roger K. Moore,et al.  Modelling asynchrony in speech using elementary single-signal decomposition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  R. Dawkins The Blind Watchmaker , 1986 .

[15]  A. Gopnik The scientist in the crib , 1999 .

[16]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[17]  S. Scott,et al.  Identification of a pathway for intelligible speech in the left temporal lobe. , 2000, Brain : a journal of neurology.

[18]  P. Fletcher,et al.  Neural processing of nouns and verbs: the role of inflectional morphology , 2004, Neuropsychologia.

[19]  Roger K. Moore A comparison of the data requirements of automatic speech recognition systems and human listeners , 2003, INTERSPEECH.

[20]  S. Goldinger Words and voices: episodic traces in spoken word identification and recognition memory. , 1996, Journal of experimental psychology. Learning, memory, and cognition.

[21]  Roger K. Moore Speech Pattern Processing , 1999 .

[22]  S. Pinker How the Mind Works , 1999, Philosophy after Darwin.

[23]  Roger K. Moore,et al.  Plasticity in Systems for Automatic Speech Recognition: A Review , 2005 .

[24]  Roger K. Moore Critique: The potential role of speech production models in automatic speech recognition , 1996 .

[25]  Robin Milner,et al.  Grand Challenges for Computing Research , 2005, Comput. J..

[26]  Roger K. Moore,et al.  An investigation into a simulation of episodic memory for automatic speech recognition , 2005, INTERSPEECH.

[27]  Roger K. Moore Whither a theory of speech pattern processing? , 1993, EUROSPEECH.

[28]  Anne Cutler,et al.  Constraints on theories of human vs. machine recognition of speech , 2001 .

[29]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[30]  Pablo Rychter Modularidad y teoría computacional de la mente en la obra de Jerry Fodor: Nota crítica en torno a The Mind Doesn't Work that Way , 2002 .

[31]  G. Carpenter,et al.  Behavioral and Brain Sciences , 1999 .

[32]  D Norris,et al.  Merging information in speech recognition: Feedback is never necessary , 2000, Behavioral and Brain Sciences.

[33]  Mark Huckvale,et al.  Improvements in Speech Synthesis , 2001 .

[34]  Björn Lindblom,et al.  Explaining Phonetic Variation: A Sketch of the H&H Theory , 1990 .

[35]  Steven Grand Growing up with Lucy , 2003 .

[36]  Douglas L. Hintzman,et al.  "Schema Abstraction" in a Multiple-Trace Memory Model , 1986 .

[37]  Roger K. Moore,et al.  Simultaneous recognition of concurrent speech signals using hidden Markov model decomposition , 1991, EUROSPEECH.

[38]  R. Patterson,et al.  The Processing of Temporal Pitch and Melody Information in Auditory Cortex , 2002, Neuron.

[39]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[40]  Les E. Atlas,et al.  The challenge of spoken language systems: research directions for the nineties , 1995, IEEE Trans. Speech Audio Process..

[41]  Louis C. W. Pols Flexible, robust and efficient human speech processing versus present-day speech technology. , 1999 .