Spectral and textural feature-based system for automatic detection of fricatives and affricates

Phoneme spotting in continuous speech has various applications — in speech recognition, smart audio filtering, multimedia synchronization and other fields. Many studies on phoneme spotting have been conducted, using different approaches. We present two algorithms for spotting fricatives (such as /s/, /sh/, /f/) and affricates (/ts/, /ch/) — one based on a cepstrogram-matching approach, and the other on an LDA classifier with a feature vector constructed from temporal, spectral and textural features of the audio signal. Tested on a selection of speech and song recordings, the algorithms demonstrate correct identification rate of over 90% and specificity of over 85%.

[1]  Jan Van der Spiegel,et al.  An acoustic-phonetic feature-based system for the automatic recognition of fricative consonants , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Ben-Zion Bobrovsky,et al.  Plosive spotting with margin classifiers , 2001, INTERSPEECH.

[3]  Dima Ruinskiy,et al.  An Effective Algorithm for Automatic Detection and Exact Demarcation of Breath Sounds in Speech and Song Signals , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Benjamin Lecouteux,et al.  term spotting in spontaneous speech , 2009 .

[5]  Yoram Singer,et al.  An Online Algorithm for Hierarchical Phoneme Classification , 2004, MLMI.

[6]  George Kalliris,et al.  Phoneme Recognition for 3D Modeled Digital Character Talking Emulation , 2002 .

[7]  Hiromasa Fujihara,et al.  Three techniques for improving automatic synchronization between music and lyrics: Fricative detection, filler model, and novel feature vectors for vocal activity detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Leontios J. Hadjileontiadis A Texture-Based Classification of Crackles and Squawks Using Lacunarity , 2009, IEEE Transactions on Biomedical Engineering.

[9]  David G. Stork,et al.  Pattern Classification , 1973 .

[10]  Aishy Amer,et al.  An Online System for Synchronized Processing of Video and Audio Signals , 2006, 2006 Canadian Conference on Electrical and Computer Engineering.