Speech Segmentation and Clustering Methods for a New Speech Recognition Architecture

To reduce the gap between performance of traditional speech recognition systems and human speech recognition skills, a new architecture is required. A system that is capable of incremental learning offers one such solution to this problem. This thesis introduces a bottom-up approach for such a speech processing system, consisting of a novel blind speech segmentation algorithm, a segmental feature extraction methodology, and data classification by incremental clustering. All methods were evaluated by extensive experiments with a broad range of test material and the evaluation methodology was itself also scrutinized. The segmentation algorithm achieved above standard quality results compared to what is found in current literature regarding blind segmentation. Possibilities for follow-up research of memory structures and intelligent top-down feedback in speech processing are also outlined.

[1]  D. Kemp Stimulated acoustic emissions from within the human auditory system. , 1978, The Journal of the Acoustical Society of America.

[2]  P. Bertelson,et al.  Does awareness of speech as a sequence of phones arise spontaneously? , 1979, Cognition.

[3]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[4]  S. Blumstein,et al.  The Role of Segmentation in Phonological Processing: An fMRI Investigation , 2000, Journal of Cognitive Neuroscience.

[5]  Roger K. Moore,et al.  PRESERVING FINE PHONETIC DETAIL USING EPISODIC MEMORY: AUTOMATIC SPEECH RECOGNITION WITH MINERVA2 , 2007 .

[6]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[7]  P. Loizou,et al.  The influence of noise on vowel and consonant cues. , 2005, The Journal of the Acoustical Society of America.

[8]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9]  Lance J. Rips,et al.  Structure and process in semantic memory: A featural model for semantic decisions. , 1974 .

[10]  Thomas Gold,et al.  Hearing , 1953, Trans. IRE Prof. Group Inf. Theory.

[11]  S. Trehub The Discrimination of Foreign Speech Contrasts by Infants and Adults. , 1976 .

[12]  Anna Esposito,et al.  A new text-independent method for phoneme segmentation , 2001, Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems. MWSCAS 2001 (Cat. No.01CH37257).

[13]  R. Hari,et al.  Viewing Lip Forms Cortical Dynamics , 2002, Neuron.

[14]  Allard Jongman,et al.  Modeling recognition of speech sounds with minerva2 , 2002, INTERSPEECH.

[15]  H. Benedict,et al.  Early lexical development: comprehension and production , 1979, Journal of Child Language.

[16]  P. Tibbetts :Cognitive Neuroscience: The Biology of the Mind , 2009 .

[17]  G. Rizzolatti,et al.  The mirror-neuron system. , 2004, Annual review of neuroscience.

[18]  J. Rauschecker,et al.  Hierarchical Organization of the Human Auditory Cortex Revealed by Functional Magnetic Resonance Imaging , 2001, Journal of Cognitive Neuroscience.

[19]  Louis ten Bosch,et al.  ACORNS - towards computational modeling of communication and recognition skills , 2007, 6th IEEE International Conference on Cognitive Informatics.

[20]  Thippur V. Sreenivas,et al.  Robust parameters for automatic segmentation of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[22]  L. L. Elliott,et al.  Adult and child discrimination of CV syllables differing in voicing onset time. , 1986, Child development.

[23]  D. Buonomano,et al.  Cortical plasticity: from synapses to maps. , 1998, Annual review of neuroscience.

[24]  P. Ladefoged A course in phonetics , 1975 .

[25]  Tomohiro Nakatani,et al.  A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition , 2006, Speech Commun..

[26]  Paul Dalsgaard,et al.  On the robust automatic segmentation of spontaneous speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[27]  I. Neath Human Memory: An Introduction to Research, Data, and Theory , 1997 .

[28]  E. G. Jones,et al.  Tonotopic organization of auditory cortical fields delineated by parvalbumin immunoreactivity in macaque monkeys , 1997, The Journal of comparative neurology.

[29]  John Laver,et al.  Principles of Phonetics: Principles of transcription , 1994 .

[30]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[31]  W. Fitch The evolution of speech: a comparative review , 2000, Trends in Cognitive Sciences.

[32]  Scott Sinnett,et al.  Speech segmentation by statistical learning depends on attention , 2005, Cognition.

[33]  D. Boatman Cortical bases of speech perception:evidence from functional lesion studies , 2004, Cognition.

[34]  D. Roy Grounding words in perception and action: computational insights , 2005, Trends in Cognitive Sciences.

[35]  A. Samuel Phonemic restoration: insights from a new methodology. , 1981, Journal of experimental psychology. General.

[36]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[37]  Douglas L. Hintzman,et al.  MINERVA 2: A simulation model of human memory , 1984 .

[38]  Dominic W. Massaro,et al.  The motor theory of speech perception revisited , 2008, Psychonomic bulletin & review.

[39]  Thippur V. Sreenivas,et al.  Automatic speech segmentation using average level crossing rate information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[40]  Rao P. Gullapalli,et al.  An fMRI Investigation of Speech and Tone Segmentation , 2004, Journal of Cognitive Neuroscience.

[41]  Maria-Barbara Wesenick,et al.  Estimating the quality of phonetic transcriptions and segmentations of speech signals , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[42]  Youngjik Lee Lee,et al.  Selecting Good Speech Features for Recognition , 1996 .

[43]  Lawrence R. Rabiner,et al.  Automatic Speech Recognition - A Brief History of the Technology Development , 2004 .

[44]  M. Ross Quillian,et al.  Retrieval time from semantic memory , 1969 .

[45]  Florian Schiel,et al.  Automatic Phonetic Transcription of Non-Prompted Speech , 1999 .

[46]  M. Iacoboni,et al.  Listening to speech activates motor areas involved in speech production , 2004, Nature Neuroscience.

[47]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[48]  R. M. Warren Perceptual Restoration of Missing Speech Sounds , 1970, Science.

[49]  Kate Knill,et al.  Hidden Markov Models in Speech and Language Processing , 1997 .

[50]  Hynek Hermansky,et al.  Should recognizers have ears? , 1998, Speech Commun..

[51]  Fabrizio Balducci,et al.  PERFORMANCE EVALUATION TEST OF AN AUTOMATIC SEGMENTATION SYSTEM FOR ITALIAN AND AMERICAN-ENGLISH CONTINUOUS SPEECH , 1999 .

[52]  Jerry D. Gibson,et al.  Speech analysis and segmentation by parametric filtering , 1996, IEEE Trans. Speech Audio Process..

[53]  Bryan L. Pellom,et al.  The analysis and design of architecture systems for speech recognition on modern handheld-computing devices , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[54]  Odette Scharenborg,et al.  Segmentation of speech: child's play? , 2007, INTERSPEECH.

[55]  Margit Antal,et al.  SPEAKER INDEPENDENT PHONEME CLASSIFICATION IN CONTINUOUS SPEECH , 2004 .

[56]  Kris Demuynck,et al.  A Comparison of Different Approaches to Automatic Speech Segmentation , 2002, TSD.

[57]  J. Werker,et al.  Infants listen for more phonetic detail in speech perception than in word-learning tasks , 1997, Nature.

[58]  Joongheon Kim,et al.  Low-energy localized clustering: an adaptive cluster radius configuration scheme for topology control in wireless sensor networks , 2005, 2005 IEEE 61st Vehicular Technology Conference.

[59]  James R. Glass,et al.  Unsupervised Word Acquisition from Speech using Pattern Discovery , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[60]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[61]  P. Bertelson,et al.  Literacy training and speech segmentation , 1986, Cognition.

[62]  Peter Vary,et al.  An adaptive multi rate wideband speech codec with adaptive gain re-quantization , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[63]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[64]  Daniel Swingley,et al.  Statistical clustering and the contents of the infant vocabulary , 2005, Cognitive Psychology.

[65]  Steven Greenberg Strategies for automatic multi-tier annotation of spoken language corpora , 2003, INTERSPEECH.

[66]  Douglas L. Hintzman,et al.  "Schema Abstraction" in a Multiple-Trace Memory Model , 1986 .

[67]  V. Mountcastle,et al.  An organizing principle for cerebral function : the unit module and the distributed system , 1978 .

[68]  John A. Detre,et al.  Activation of human auditory cortex during speech perception: Effects of monaural, binaural, and dichotic presentation , 2008, Neuropsychologia.

[69]  Roger K. Moore,et al.  An investigation into a simulation of episodic memory for automatic speech recognition , 2005, INTERSPEECH.

[70]  M. Ulfendahl,et al.  Outer Hair Cells Provide Active Tuning in the Organ of Corti. , 1998, News in physiological sciences : an international journal of physiology produced jointly by the International Union of Physiological Sciences and the American Physiological Society.

[71]  C. Price,et al.  Speech-specific auditory processing: where is it? , 2005, Trends in Cognitive Sciences.

[72]  Stephen A. Zahorian,et al.  Phone classification with segmental features and a binary-pair partitioned neural network classifier , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[73]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[74]  Odette Scharenborg,et al.  Finding Maximum Margin Segments in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[75]  C. Cucchiarini,et al.  Automatic phonetic transcription: An overview , 2003 .

[76]  Hervé Bourlard,et al.  Robust speaker change detection , 2004, IEEE Signal Processing Letters.

[77]  Dan Klein,et al.  Learning Structured Models for Phone Recognition , 2007, EMNLP.

[78]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[79]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[80]  J Bertoncini,et al.  Discrimination in neonates of very short CVs. , 1987, The Journal of the Acoustical Society of America.

[81]  Feature Extraction in Speech Coding and Recognition , 2022 .

[82]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[83]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[84]  J. Mazziotta,et al.  Cracking the Language Code: Neural Mechanisms Underlying Speech Parsing , 2006, The Journal of Neuroscience.

[85]  A. Samuel,et al.  Perceptual learning for speech , 2009, Attention, perception & psychophysics.

[86]  Luis A. Hernández Gómez,et al.  Automatic phonetic segmentation , 2003, IEEE Trans. Speech Audio Process..

[87]  Manish Sharma,et al.  "Blind" speech segmentation: automatic segmentation of speech without linguistic knowledge , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[88]  J. Werker,et al.  Cross-language speech perception: Evidence for perceptual reorganization during the first year of life , 1984 .

[89]  P. Denes On the Motor Theory of Speech Perception , 1965 .