A dynamic tonal perception model for optimal pitch stylization

Automatic pitch stylization is an important resource for researchers working both on prosody and speech technologies. In order to be useful, the stylized F"0 curve should contain the fewest possible number of control points while remaining, at the same time, close to the original curve from a perceptual point of view. Here, a pitch stylization algorithm aimed at finding the optimal balance between the number of employed control points and perceptual equality with respect to the original curve is presented. Rather than being defined by means of statistical closeness to the original F"0 curve, the quality of the stylized curve is defined on the basis of a dynamic tonal perception model. The number of control points is optimized on the basis of previous results showing that the stylization can be more radical in those areas of the signal where tone perception is less accurate, i.e. in non-prominent areas. Perceptual tests show that, concerning the perceptual equality of the stylization, this approach performs as well as other reference ones, with the advantage of using a significantly lower number of control points. Although it is based on a theoretical background employing phonological units like syllables, the proposed, phonetic, approach does not require any preliminary segmentation or annotation step. It combines, instead, acoustic parameters related to syllabification and prominence detection into a single model which has been designed to be both integrated, in the sense that it does not introduce any pitfalls in the process, and dynamic, in the sense that it does not include rigid tonal perception thresholds.

[1]  Rosaria Silipo,et al.  AUTOMATIC TRANSCRIPTION OF PROSODIC STRESS FOR SPONTANEOUS ENGLISH DISCOURSE , 1999 .

[2]  A. Møller,et al.  Dynamic Properties of Cochlear Nucleus Units in Response to Excitatory and Inhibitory Tones , 1974 .

[3]  Eberhard Zwicker,et al.  Direct Comparisons between the Sensations Produced by Frequency Modulation and Amplitude Modulation , 1962 .

[4]  Dag Haugland,et al.  Compressing ECG signals by piecewise polynomial approximation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  D. House Tonal perception in speech , 1990 .

[6]  J. 't Hart,et al.  Psychoacoustic backgrounds of pitch contour stylisation , 1976 .

[7]  J. T. Hart,et al.  Differential sensitivity to pitch distance, particularly in speech. , 1981 .

[8]  I. Pollack,et al.  Detection of rate of change of auditory frequency. , 1968, Journal of experimental psychology.

[9]  David House Differential perception of tonal contours through the syllable , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Francesco Cutugno,et al.  A syllable segmentation algorithm for English and italian , 2003, INTERSPEECH.

[11]  M. Rossi,et al.  Interactions of Intensity Glides and Frequency Glissandos , 1978, Language and speech.

[12]  Fabio Tamburini,et al.  Reliable prominence identification in English spontaneous speech , 2006, Speech Prosody 2006.

[13]  Aniruddh D. Patel The Relationship of Music to the Melody of Speech and to Syntactic Processing Disorders in Aphasia , 2005, Annals of the New York Academy of Sciences.

[14]  Renata Savy,et al.  CLIPS: diatopic, diamesic and diaphasic variations of spoken Italian , 2009 .

[15]  Mikoaj Wypych,et al.  Automatic Pitch Stylization Enhanced by Top-Down Processing , 2006 .

[16]  D. Maiwald,et al.  Ein Funktionsschema des Gehors zur Beschreibung der Erkennbarkeit kleiner Frequenz und Amplitudenanderungen , 1967 .

[17]  Daniel P. W. Ellis,et al.  Stylization of pitch with syllable-based linear segments , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  M E Schouten,et al.  Identification and discrimination of sweep tones , 1985, Perception & psychophysics.

[19]  D. Klatt,et al.  Discrimination of fundamental frequency contours in synthetic speech: implications for models of pitch perception. , 1973, The Journal of the Acoustical Society of America.

[20]  Antonio Origlia,et al.  A Divide et impera Algorithm for Optimal Pitch Stylization , 2011, INTERSPEECH.

[21]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[22]  Daniel Hirst,et al.  Automatic modelling of fundamental frequency using a quadratic sline function , 1993 .

[23]  Piet Mertens,et al.  The Prosogram: Semi-Automatic Transcription of Prosody Based on a Tonal Perception Model , 2004 .

[24]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[25]  Shrikanth S. Narayanan,et al.  Pitch Contour Stylization Using an Optimal Piecewise Polynomial Approximation , 2009, IEEE Signal Processing Letters.

[26]  Christophe d'Alessandro,et al.  Automatic pitch contour stylization using a model of tonal perception , 1995, Comput. Speech Lang..

[27]  D. Pisoni,et al.  The Handbook of Speech Perception , 2004 .

[28]  David House Perception of prepausal tonal contours: implications for automatic stylization of intonation , 1995, EUROSPEECH.

[29]  John Hart,et al.  A Perceptual Study of Intonation , 1990 .

[30]  Loïc Kessous,et al.  Adaptive On-Line Neural Network Retraining for Real Life Multimodal Emotion Recognition , 2006, ICANN.

[31]  Loïc Kessous,et al.  Modeling naturalistic affective states via facial and vocal expressions recognition , 2006, ICMI '06.

[32]  Alain Ghio,et al.  PERCEVAL: a Computer-Driven System for Experimentation on Auditory and Visual Perception , 2007, ArXiv.

[33]  P. Ladefoged,et al.  Binary Suprasegmental Features and Transformational Word-Accentuation Rules. , 1972 .

[34]  Hartmut Traunmüller,et al.  Perception of syllable prominence by listeners with and without competence in the tested language , 2002 .

[35]  Daniel Hirst,et al.  Levels of Representation and Levels of Analysis for the Description of Intonation Systems , 2000 .

[36]  M. Rossi,et al.  Le seuil de glissando ou seuil de perception des variations tonales pour les sons de la parole , 1971 .

[37]  Anne Lacheret,et al.  A corpus-based learning method for prominence detection in spontaneous speech , 2009 .

[38]  Russell L. Sergeant,et al.  Sensitivity to Unidirectional Frequency Modulation , 1961 .

[39]  Yu-Wen Chen,et al.  Perception of intonation as a function of sex. , 2009 .

[40]  Antonio Origlia,et al.  On the Use of the Rhythmogram for Automatic Syllabic Prominence Detection , 2011, INTERSPEECH.

[41]  Bogdan Ludusan,et al.  Pitch behavior detection for automatic prominence recognition , 2010 .

[42]  P. Mertens,et al.  A predictive approach to the analysis of intonation in discourse in French , 2006 .

[43]  Mathieu Avanzi,et al.  ANALOR. A Tool for Semi-Automatic Annotation of French Prosodic Structure , 2008 .