Improved vocal tract model for the elongation of segment lengths in a real time

Abstract In the current work, an efficient approach has been implemented to model the fractional delay in the elongated cylindrical segments of the vocal tract in waveguide modeling. The vocal tract has been divided into uniform cylindrical segments of the variable lengths. In this case, the time taken by the sound wave to propagate through a cylindrical segment in an axial direction may not be an integer multiple of each other. This means that the delay in an axial direction is necessarily a fractional delay for each fractional elongated segment. In the previous work, to accommodate the fractional delay for each elongated cylindrical segment, two extra cylindrical segments of same lengths were added to maintain the even number of segments. In the proposed work, we add only a single extra segment for each fractional elongated segment which reduces memory and computational cost as well. To keep the even number of segments, we assume that the extra and original segments constitute a single long segment. Lagrange interpolation is used for the approximation of the fractional delay. The proposed model has been devised for the elongation of any arbitrary cylindrical segment by a suitable scaling factor. These results are validated with an accurate benchmark model. This model has a single algorithm and there is no need to make sections of the segments for the elongation of the intermediate segments.

[1]  Tapio Takala,et al.  Simulation of Room Acoustics with a 3-D Finite Difference Mesh , 1994, ICMC.

[2]  Julius O. Smith,et al.  Physical Modeling Using Digital Waveguides , 1992 .

[3]  I. Tokuda,et al.  Effect of level difference between left and right vocal folds on phonation: Physical experiment and theoretical study. , 2016, The Journal of the Acoustical Society of America.

[4]  John Nicholas Holmes,et al.  Speech synthesis , 1972 .

[5]  Peter Birkholz,et al.  Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets , 2010, INTERSPEECH.

[6]  Brad H. Story,et al.  Vocal-tract modeling: fractional elongation of segment lengths in a waveguide model with half-sample delays , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[8]  Peter B. Johns,et al.  Numerical solution of 2-dimensional scattering problems using a transmission-line matrix , 1971 .

[9]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[10]  J. Flanagan,et al.  Self-oscillating source for vocal-tract synthesizers , 1968 .

[11]  Man Mohan Sondhi,et al.  A hybrid time-frequency domain articulatory speech synthesizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13]  M. Omair Ahmad,et al.  Results on maximally flat fractional-delay systems , 2004, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[15]  David M. Howard,et al.  Singing synthesis with an evolved physical model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Brad H. Story,et al.  Phrase-level speech simulation with an airway modulation model of speech production , 2013, Comput. Speech Lang..

[17]  Matti Karjalainen Mixed physical modeling: DWG + FDTD + WDF , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[18]  Julius O. Smith A new approach to digital reverberation using closed waveguide networks , 1985 .

[19]  Vesa Vlimki,et al.  Discrete-Time Modeling of Acoustic Tubes Using Fractional Delay Filters , 1998 .

[20]  G. Fant Acoustic theory of speech production : with calculations based on X-ray studies of Russian articulations , 1961 .

[21]  P. Morse Vibration and Sound , 1949, Nature.

[22]  J. Švec,et al.  Human vocal tract resonances and the corresponding mode shapes investigated by three-dimensional finite-element modelling based on CT measurement , 2015, Logopedics, phoniatrics, vocology.

[23]  Jianwu Dang,et al.  Mandarin vowel synthesis based on 2D and 3D vocal tract model by finite-difference time-domain method , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[24]  Damian Murphy,et al.  Digital waveguide mesh modeling of the vocal tract acoustics , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[25]  David M. Howard,et al.  Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Wolfgang J. R. Hoefer,et al.  The Transmission-Line Matrix Method--Theory and Applications , 1985 .

[27]  David M. Howard,et al.  Three-Dimensional Digital Waveguide Mesh Simulation of Cylindrical Vocal Tract Analogs , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[29]  M. Karjalainen,et al.  Discrete-time modelling of musical instruments , 2005 .

[30]  K. S. Syed,et al.  A One-Mass Physical Model of the Vocal Folds with Seesaw-Like Oscillations , 2011 .

[31]  David M. Howard,et al.  Real-Time Dynamic Articulations in the 2-D Waveguide Mesh Vocal Tract Model , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Julius O. Smith,et al.  Principles of Digital Waveguide Models of Musical Instruments , 2002 .

[33]  Brad H. Story,et al.  Parameterization of vocal tract area functions by empirical orthogonal modes , 1998 .

[34]  Daniel P. W. Ellis,et al.  Speech and Audio Signal Processing - Processing and Perception of Speech and Music, Second Edition , 1999 .

[35]  Paavo Alku,et al.  One-delayed-mass model for efficient synthesis of glottal flow , 2001, INTERSPEECH.

[36]  Matti Karjalainen,et al.  Improving the kelly-lochbaum vocal tract model using conical tube sections and fractional delay filtering techniques , 1994, ICSLP.

[37]  Julius O. Smith,et al.  Physical Modeling with the 2-D Digital Waveguide Mesh , 1993, ICMC.

[38]  S. Van Duyne,et al.  The 2-D digital waveguide mesh , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[39]  K. S. Syed,et al.  Two dimensional featured one dimensional digital waveguide model for the vocal tract , 2015, Comput. Speech Lang..

[40]  J. L. Flanagan,et al.  Acoustic properties of longitudinal displacement in vocal cord vibration , 1977, The Bell System Technical Journal.