Voicesauce: A Program for Voice Analysis

VOICESAUCE is a new application, implemented in MATLAB, which provides automated voice measurements over time from audio recordings. The measures currently computed are F0, H1(*), H2(*), H4(*), H1(*)‐H2(*), H2(*)‐H4(*), H1(*)‐A1, H1(*)‐A2, H1(*)‐A3, energy, Cepstral Peak Prominence, F1–F4, and B1–B4, where (*) indicates that harmonic amplitudes are reported with and without corrections for formant frequencies and bandwidths [Iseli et al. (2006)]. Formant values are calculated using the Snack Sound Toolkit, while F0 is calculated using the STRAIGHT algorithm; harmonic spectra magnitudes are computed pitch‐synchronously. VOICESAUCE takes as input a folder of wav files, and for each input wav file produces a MATLAB file with values every millsecond for all measures. It can operate over the whole input file or over segments delimited by a PRAAT textgrid file. VOICESAUCE then takes these MATLAB outputs, optionally along with electroglottographic measurements obtained separately from PCQUIRERX, and provides con...

[1]  Abeer Alwan,et al.  Inter- and intra-speaker variability of glottal flow derivative using the LF model , 2000, INTERSPEECH.

[2]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[3]  Chao-Yang Lee,et al.  Identifying isolated, multispeaker Mandarin tones from brief acoustic input: a perceptual and acoustic study. , 2009, The Journal of the Acoustical Society of America.

[4]  Abeer Alwan,et al.  An improved correction formula for the estimation of harmonic magnitudes and its application to open quotient estimation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality. , 1994, Journal of speech and hearing research.

[6]  Abeer Alwan,et al.  Age-and Gender-Dependent Analysis of Voice Source Characteristics , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[8]  Roy D. Patterson,et al.  An instantaneous-frequency-based pitch extraction method for high-quality speech transformation: revised TEMPO in the STRAIGHT-suite , 1998, ICSLP.

[9]  Abeer Alwan,et al.  Age, sex, and vowel dependencies of acoustic measures related to the voice source. , 2007, The Journal of the Acoustical Society of America.

[10]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[11]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[12]  Xuejing Sun,et al.  Pitch determination and voice quality analysis using Subharmonic-to-Harmonic Ratio , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  J W Hawks,et al.  A formant bandwidth estimation procedure for vowel synthesis [43.72.Ja]. , 1995, The Journal of the Acoustical Society of America.

[14]  Jody Kreiman,et al.  Toward a taxonomy of nonmodal phonation , 2001, J. Phonetics.

[15]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[16]  G. de Krom A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. , 1993, Journal of speech and hearing research.

[17]  Christina M. Esposito Variation in contrastive phonation in Santa Ana Del Valle Zapotec , 2010, Journal of the International Phonetic Association.

[18]  Chad Vicenik,et al.  An acoustic study of Georgian stop consonants , 2010, Journal of the International Phonetic Association.

[19]  Jonathan Harrington,et al.  Phonetic Analysis of Speech Corpora , 2010 .