SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free L-F Model Component

This article introduces a set of interactive tools for studying fundamentals of speech production, perception and processing. In addition to this voice production simulator, it consists of interactive time-frequency representation, auditory representation visualizer and a vocal tract shape visualizer for introductory materials. It consists of compiled executables for Windows and Mac environment, which do not require MATLAB license. The MATLAB sources of the tools and their constituent functions are publicly accessible under open source license.

[1]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[2]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[3]  F. Itakura,et al.  Symmetry between linear predictive coding and composite sinusoidal modeling , 2002 .

[4]  Heiga Zen,et al.  Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis , 2016, SSW.

[5]  Ritu Sharma Speech Synthesis , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[6]  Hideki Kawahara,et al.  Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT , 2005, INTERSPEECH.

[7]  Hideki Kawahara,et al.  TUSK: A Framework for Overviewing the Performance of F0 Estimators , 2016, INTERSPEECH.

[8]  D G Childers,et al.  Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.

[9]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[10]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[11]  Tomoki Toda,et al.  Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).