Acoustic analysis of Sindhi speech - a pre-curser for an ASR system

The functional and formative properties of speech sounds are usually referred to as acoustic-phonetics in linguistics. This research aims to demonstrate acoustic-phonetic features of the elemental sounds of Sindhi, which is a branch of the Indo-European family of languages mainly spoken in the Sindh province of Pakistan and in some parts of India. In addition to the available articulatory-phonetic knowledge; acoustic-phonetic knowledge has been classified for the identification and classification of Sindhi language sounds. Determining the acoustic features of the language sounds helps to bring together the sounds with similar acoustic characteristics under the name of one natural class of meaningful phonemes. The obtained acoustic features and corresponding statistical results for a particular natural class of phonemes provides a clear understanding of the meaningful phonemes of Sindhi and it also helps to eliminate redundant sounds present in the inventory. At present Sindhi includes nine redundant, three interchanging, three substituting, and three confused pairs of consonant sounds. Some of the unique acoustic-phonetic features of Sindhi highlighted in this study are determining the acoustic features of the large number of the contrastive voiced implosives of Sindhi and the acoustic impact of the language flexibility in terms of the insertion and digestion of the short vowels in the utterance. In addition to this the issue of the presence of the affricate class of sounds and the diphthongs in Sindhi is addressed. The compilation of the meaningful language phoneme set by learning their acoustic-phonetic features serves one of the major goals of this study; because twelve such sounds of Sindhi are studied that are not yet part of the language alphabet. The main acoustic features learned for the phonological structures of Sindhi are the fundamental frequency, formants, and the duration — along with the analysis of the obtained acoustic waveforms, the formant tracks and the computer generated spectrograms. The impetus for doing such research comes from the fact that detailed knowledge of the sound characteristics of the language-elements has a broad variety of applications — from developing accurate synthetic speech production systems to modeling robust speaker-independent speech recognizers. The major research achievements and contributions this study provides in the field include the compilation and classification of the elemental sounds of Sindhi. Comprehensive measurement of the acoustic features of the language sounds; suitable to be incorporated into the design of a Sindhi ASR system. Understanding of the dialect specific acoustic variation of the elemental sounds of Sindhi. A speech database comprising the voice samples of the native Sindhi speakers. Identification of the language‘s redundant, substituting and interchanging pairs of sounds. Identification of the language‘s sounds that can potentially lead to the segmentation and recognition errors for a Sindhi ASR system design. The research achievements of this study create the fundamental building blocks for future work to design a state-of-the-art prototype, which is: gender and environment independent, continuous and conversational ASR system for Sindhi.

[1]  Sandra P. Whiteside,et al.  Analysis of ten vowel sounds across gender and regional/cultural accent , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Jim Miller,et al.  Spontaneous Spoken Language: Syntax and Discourse , 1998 .

[3]  Roy D. Patterson,et al.  Size Information in the Production and Perception of Communication Sounds , 2008 .

[4]  Sadaoki Furui,et al.  50 Years of Progress in Speech and Speaker Recognition Research , 1970 .

[5]  H. Traunmüller Analytical expressions for the tonotopic sensory scale , 1990 .

[6]  Jaye Padgett,et al.  Glides, vowels, and features , 2008 .

[7]  Brian C J Moore,et al.  Introduction. The perception of speech: from sound to meaning , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[8]  Hartmut R. Pfitzinger,et al.  Acoustic correlates of the IPA vowel diagram , 2003 .

[9]  T J Edwards Multiple features analysis of intervocalic English plosives. , 1981, The Journal of the Acoustical Society of America.

[10]  Kenneth N. Stevens,et al.  Models for the production and acoustics of stop consonants , 1993, Speech Commun..

[11]  Gunnar Fant,et al.  Speech sounds and features , 1973 .

[12]  Abhijit Mitra,et al.  Identification of Primitive Speech Signals using TMS320C54x DSP Processor , 2009 .

[13]  E. Vajda Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet , 2000 .

[14]  Keith A. Johnson,et al.  Acoustic and Auditory Phonetics , 1997, Phonetica.

[15]  A. Jongman,et al.  Acoustic characteristics of clearly spoken English fricatives. , 2009, The Journal of the Acoustical Society of America.

[16]  Stefan Dobler,et al.  A robust connected-words recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Philip Birch,et al.  On preprocessing of speech signals , 2008 .

[18]  P. Ladefoged A course in phonetics , 1975 .

[19]  A. M. D. de Manrique,et al.  Acoustic analysis and perception of Spanish fricative consonants. , 1981, The Journal of the Acoustical Society of America.

[20]  C. Connine,et al.  Perceptual learning of co-articulation in speech. , 2009, Journal of memory and language.

[21]  B. Lobanov Classification of Russian Vowels Spoken by Different Speakers , 1971 .

[22]  R. P. Egorova Sindhi Language , 1971 .

[23]  Suzanne Boyce,et al.  A magnetic resonance imaging-based articulatory and acoustic study of "retroflex" and "bunched" American English /r/. , 2008, The Journal of the Acoustical Society of America.

[24]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[25]  Abhijit Mitra,et al.  Recognition of Isolated Speech Signals using Simplified Statistical Parameters , 2007 .

[26]  G. Rosenhouse,et al.  An acoustic analysis of modern Hebrew vowels and voiced consonants , 1996 .

[27]  Goutam Saha,et al.  A New Silence Removal and Endpoint Detection Algorithm for Speech and Speaker Recognition Applications , 2006 .

[28]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for large vocabulary continuous speech recognition , 1992 .

[29]  M.A. Khawaja,et al.  Segmentation of Sindhi Speech using Formants , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[30]  D.R. Reddy,et al.  Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[31]  Ronald K. Pearson,et al.  Outliers in process modeling and identification , 2002, IEEE Trans. Control. Syst. Technol..

[32]  Sunil Kumar Jha Acoustic analysis of the Maithili diphthongs , 1985 .

[33]  V. Sarma,et al.  Studies on pattern recognition approach to voiced-unvoiced-silence classification , 1978, ICASSP.

[34]  Daniel Jones An outline of English phonetics , 1956 .

[35]  J. B. Pickering,et al.  Vowel Perception and Production , 1994 .

[36]  Diane Kewley-Port,et al.  Intelligibility and acoustic correlates of Japanese accented English vowels , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[37]  Paroo Nihalani,et al.  Phonetic Implementation of Implosives , 1986 .

[38]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[39]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[40]  K. Stevens,et al.  Effect of burst amplitude on the perception of stop consonant place of articulation. , 1983, The Journal of the Acoustical Society of America.

[41]  Jan Van der Spiegel,et al.  Acoustic-phonetic features for the automatic classification of stop consonants , 2001, IEEE Trans. Speech Audio Process..

[42]  P. Ladefoged,et al.  The sounds of the world's languages , 1996 .

[43]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[44]  Carol Y. Espy-Wilson,et al.  A feature‐based semivowel recognition system , 1994 .

[45]  Ronald K. Pearson,et al.  Mining imperfect data - dealing with contamination and incomplete records , 2005 .

[46]  Didier Demolin Phonology and Phonetic Evidence: The phonetics and phonology of glottalized consonants in Lendu , 1995 .

[47]  J. Fletcher,et al.  Acoustic and durational properties of Indian English vowels , 2009 .

[48]  Carol Y. Espy-Wilson,et al.  An acoustic-phonetic approach to speech recognition : application to the semivowels , 1987 .

[49]  Eugenio Martínez-Celdrán,et al.  Problems in the classification of approximants , 2004, Journal of the International Phonetic Association.

[50]  Chris Chatwin,et al.  On shadow elimination after moving region segmentation based on different threshold selection strategies , 2007 .

[51]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[52]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[53]  David Deterding An instrumental study of the monophthong vowels of Singapore English , 2003 .

[54]  A. Bradlow,et al.  A comparative acoustic study of English and Spanish vowels. , 1995, The Journal of the Acoustical Society of America.

[55]  Matthew Gordon,et al.  Phonetic structures of Turkish Kabardian , 2006, Journal of the International Phonetic Association.

[56]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[57]  A. Jongman Acoustics of American English Speech: A Dynamic Approach , 1995 .

[58]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[59]  M. Q. Bughio,et al.  The Diachronic Sociolinguistic Situation in Sindh (Before and after the emergence of Pakistan) , 2006 .

[60]  Yang Chen,et al.  Vowel production by Mandarin speakers of English , 2001 .

[61]  M. Tabain,et al.  Variability in Fricative Production and Spectra , 2001, Language and speech.

[62]  Stephen E. Levinson,et al.  Continuous speech recognition from a phonetic transcription , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[63]  T. Gay Effect of speaking rate on diphthong formant movements. , 1968, The Journal of the Acoustical Society of America.

[64]  S. Blumstein,et al.  Acoustic properties for place of articulation in nasal consonants. , 1990, The Journal of the Acoustical Society of America.

[65]  Jackson J. Spielvogel,et al.  Essential world history , 2002 .

[66]  A. M. B. D. Manrique,et al.  Acoustic Analysis of the Spanish Diphthongs , 1979 .

[67]  Yiya Chen,et al.  The acoustic realization of vowels of Shanghai Chinese , 2008, J. Phonetics.

[68]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[69]  Yves Laprie,et al.  A study of the French Vowels Through The Main Constriction of the Vocal Tract Using an Acoustic-to-articulatory inversion method , 2003 .

[70]  K. Stevens,et al.  Role of formant transitions in the voiced-voiceless distinction for stops. , 1974, The Journal of the Acoustical Society of America.

[71]  Lourdes Aguilar,et al.  Hiatus and diphthong: Acoustic cues and speech situation differences , 1999, Speech Commun..

[72]  Victor W. Zue,et al.  Acoustic Characteristics of Stop Consonants: A Controlled Study , 1976 .

[73]  Aiza Sarwar,et al.  Diphthongs in Urdu Language and Analysis of their Acoustic Properties , 2004 .

[74]  John Marshall,et al.  Mohenjo-daro And The Indus Civilization Vol.ii , 1931 .

[75]  B. J. Bailey Speech Science Primer: Physiology, Acoustics, and Perception of Speech , 1981 .

[76]  M.A. Khawaja Acoustic Analysis of Phonetics of Arabic Script Sindhi Language to evaluate Vowel-Consonant Segmentation , 2004 .

[77]  N. Cox Statistical Models in Engineering , 1970 .

[78]  P Nihalani Lingual Articulation of Stops in Sindhi , 1974, Phonetica.

[79]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .