Technology and Implementation

The aim of this chapter is to highlight relevant technical factors and limitations affecting collection and interpretation of speech signals. We concentrate on the typical corruption or distortion of the speech signal which is encountered in the real world, and where possible, we include an indication of how important these effects can be. Transmission and encoding of speech signals in mobile phone networks and the internet is almost invariably lossy, and this has an acute effect on the accuracy of speech recognition systems. Published research has also shown a comparable effect on the accuracy of dysphonia/dysarthria detection. The relationship between some specific aspects of the data collection process and the validity of assessments of new techniques, is discussed. The current absence of a realistic database of remotely collected speech samples is highlighted, and adherence to standardised methods and datasets is shown to be crucial to the evaluation of new algorithms. Methods for combining multiple features into a single result are frequently required, and these too are discussed in this chapter.

[1]  Jean Schoentgen,et al.  Time series analysis of jitter , 1995 .

[2]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[3]  Asa Masaki Optimizing acoustic and perceptual assessment of voice quality in children with vocal nodules , 2009 .

[4]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[5]  Stefan Hadjitodorov,et al.  A computer system for acoustic analysis of pathological voices and laryngeal diseases screening. , 2002, Medical engineering & physics.

[6]  Max A. Little,et al.  Novel Speech Signal Processing Algorithms for High-Accuracy Classification of Parkinson's Disease , 2012, IEEE Transactions on Biomedical Engineering.

[7]  Yannis Stylianou,et al.  Voice Pathology Detection Based eon Short-Term Jitter Estimations in Running Speech , 2009, Folia Phoniatrica et Logopaedica.

[8]  A. Aronson,et al.  Motor Speech Disorders , 2014 .

[9]  Daryush D. Mehta,et al.  Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods , 2008, Current opinion in otolaryngology & head and neck surgery.

[10]  Yonghong Yan,et al.  Universal speech tools: the CSLU toolkit , 1998, ICSLP.

[11]  J. I. Godino-Llorente,et al.  Acoustic analysis of voice using WPCVox: a comparative study with Multi Dimensional Voice Program , 2008, European Archives of Oto-Rhino-Laryngology.

[12]  Ling Guan,et al.  Missing data ASR with fusion of features and combination of recognizers , 2006, 2006 IEEE Spoken Language Technology Workshop.

[13]  Miguel Angel Ferrer-Ballester,et al.  Automatic Detection of Pathologies in The Voice by HOS Based Parameters , 2001, EURASIP J. Adv. Signal Process..

[14]  Elmar Nöth,et al.  PEAKS - A system for the automatic evaluation of voice and speech disorders , 2009, Speech Commun..

[15]  Sazali Yaacob,et al.  Time-Domain Features And Probabilistic Neural Network For The Detection Of Vocal Fold Pathology , 2010 .

[16]  Eduardo Castilllo‐Guerra,et al.  Automatic Acoustics Measurement of Audible Inspirations in Pathological Voices , 2008 .

[17]  Miguel Angel Ferrer-Ballester,et al.  Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[19]  Ben P. Milner,et al.  Towards improving the robustness of distributed speech recognition in packet loss , 2006, Speech Commun..

[20]  Shrikanth Narayanan,et al.  Feature analysis for automatic detection of pathological speech , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[21]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[22]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[23]  Alison Ferguson,et al.  Exploring the potential for corpus-based research in speech-language pathology , 2009 .

[24]  Douglas E. Sturim,et al.  Automatic dysphonia recognition using biologically-inspired amplitude-modulation features , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25]  Jacques Koreman,et al.  A GERMAN DATABASE OF PATTERNS OF PATHOLOGICAL VOCAL FOLD VIBRATION , 1997 .

[26]  Mohammad Tariqul Islam,et al.  Smart Antenna UKM Testbed for Digital Beamforming System , 2009, EURASIP J. Adv. Signal Process..

[27]  Luís C. Oliveira,et al.  Jitter Estimation Algorithms for Detection of Pathological Voices , 2009, EURASIP J. Adv. Signal Process..

[28]  Maria Markaki,et al.  Using modulation spectra for voice pathology detection and classification , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[29]  Priyanka Medida Spectral analysis of pathological acoustic speech waveforms , 2009 .

[30]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[31]  Jon Barker,et al.  Soft decisions in missing data techniques for robust automatic speech recognition , 2000, INTERSPEECH.

[32]  Patricia A. Keating,et al.  Linguistic Voice Quality , 2006 .

[33]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[34]  Jean Schoentgen,et al.  Automatic perceptual categorization of disordered connected speech , 2010, INTERSPEECH.

[35]  John-Paul Hosom,et al.  Diagnostic Assessment of Childhood Apraxia of Speech Using Automatic Speech Recognition (ASR) Methods. , 2004, Journal of medical speech-language pathology.

[36]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.

[37]  C. Broun,et al.  Distributed speaker recognition using the ETSI distributed speech recognition standard , 2001 .

[38]  Richard B. Reilly,et al.  Voice Pathology Assessment Based on a Dialogue System and Speech Analysis , 2004, AAAI Technical Report.

[39]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[40]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[41]  Pedro Gómez Vilda,et al.  Methodological issues in the development of automatic systems for voice pathology detection , 2006, Biomed. Signal Process. Control..