The shift-invariant discrete wavelet transform and application to speech waveform analysis.

The discrete wavelet transform may be used as a signal-processing tool for visualization and analysis of nonstationary, time-sampled waveforms. The highly desirable property of shift invariance can be obtained at the cost of a moderate increase in computational complexity, and accepting a least-squares inverse (pseudoinverse) in place of a true inverse. A new algorithm for the pseudoinverse of the shift-invariant transform that is easier to implement in array-oriented scripting languages than existing algorithms is presented together with self-contained proofs. Representing only one of the many and varied potential applications, a recorded speech waveform illustrates the benefits of shift invariance with pseudoinvertibility. Visualization shows the glottal modulation of vowel formants and frication noise, revealing secondary glottal pulses and other waveform irregularities. Additionally, performing sound waveform editing operations (i.e., cutting and pasting sections) on the shift-invariant wavelet representation automatically produces quiet, click-free section boundaries in the resulting sound. The capabilities of this wavelet-domain editing technique are demonstrated by changing the rate of a recorded spoken word. Individual pitch periods are repeated to obtain a half-speed result, and alternate individual pitch periods are removed to obtain a double-speed result. The original pitch and formant frequencies are preserved. In informal listening tests, the results are clear and understandable.

[1]  Thomas W. Parks,et al.  A translation-invariant wavelet representation algorithm with applications , 1996, IEEE Trans. Signal Process..

[2]  H. Ackermann,et al.  A vowel synthesizer based on formant sinusoids modulated by fundamental frequency , 1999 .

[3]  D. Donoho,et al.  Translation-Invariant De-Noising , 1995 .

[4]  Shubha Kadambe,et al.  Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[5]  Mark Sandler,et al.  DIGITAL AUDIO EFFECTS IN THE WAVELET DOMAIN , 2002 .

[6]  G. Beylkin On the representation of operators in bases of compactly supported wavelets , 1992 .

[7]  E. Chuang,et al.  Glottal characteristics of male speakers: acoustic correlates and comparison with female data. , 1996, The Journal of the Acoustical Society of America.

[8]  Hamed Sari-Sarraf,et al.  A shift-invariant discrete wavelet transform , 1997, IEEE Trans. Signal Process..

[9]  C H Shadle,et al.  Frication noise modulated by voicing, as revealed by pitch-scaled decomposition. , 2000, The Journal of the Acoustical Society of America.

[10]  A Kohlrausch,et al.  Psychoacoustical evaluation of the pitch-synchronous overlap-and-add speech-waveform manipulation technique using single-formant stimuli. , 1997, The Journal of the Acoustical Society of America.

[11]  A. Walden,et al.  Wavelet Methods for Time Series Analysis , 2000 .

[12]  Michael Frazier An introduction to wavelets through linear algebra , 1999 .

[13]  Rüdiger Hoffmann,et al.  A wavelet-domain PSOLA approach , 1998, SSW.

[14]  Douglas D. O'Shaughnessy,et al.  Automatic and reliable estimation of glottal closure instant and period , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15]  Michael R. Chernick,et al.  Wavelet Methods for Time Series Analysis , 2001, Technometrics.

[16]  B. Silverman,et al.  The Stationary Wavelet Transform and some Statistical Applications , 1995 .

[17]  J C Lucero,et al.  Time normalization of voice signals using functional data analysis. , 2000, The Journal of the Acoustical Society of America.

[18]  Alexander L. Francis,et al.  Accuracy and variability of acoustic measures of voicing onset. , 2003, The Journal of the Acoustical Society of America.

[19]  Eric Moulines,et al.  Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..

[20]  Mike E. Davies,et al.  SEPARATION OF TRANSIENT INFORMATION IN MUSICAL AUDIO USING MULTIRESOLUTION ANALYSIS TECHNIQUES , 2001 .

[21]  David James Scholl Wavelet-Based Visualization of Impulsive and Transient Sounds in Stationary Background Noise , 2001 .

[22]  David J. Scholl Translation-invariant data visualization with orthogonal discrete wavelets , 1998, IEEE Trans. Signal Process..

[23]  Richard Kronland-Martinet,et al.  The Wavelet Transform for Analysis, Synthesis, and Processing of Speech and Music Sounds , 1988 .

[24]  David James Scholl,et al.  Wavelet-Based Visualization, Separation, and Synthesis Tools for Sound Quality of Impulsive Noises , 2003 .

[25]  Christophe d'Alessandro,et al.  Robust glottal closure detection using the wavelet transform , 1999, EUROSPEECH.

[26]  A Kohlrausch,et al.  Psychoacoustical evaluation of PSOLA. II. Double-formant stimuli and the role of vocal perturbation. , 1999, The Journal of the Acoustical Society of America.

[27]  P. Mokhtari,et al.  A Corpus of Japanese Vowel Formant Patterns , 2000 .