The role of perception in defining tonal targets and their alignment

Tonal targets can be defined in terms of two-dimensions, i.e., ``alignment'' and ``scaling'', where alignment specifies the exact temporal implementation of tonal highs (H) and lows (L) relative to structural elements (such as syllables and morae) and their segments. Alignment patterns might be constrained by various linguistic factors, such as phonological as well as phonetic factors. Among the phonological factors, the grammar of stress-accent languages specifies that the tones of a pitch accent must be aligned with those syllables that are marked as stressed in the lexicon. Moreover, syllable structure can constrain tune-text alignment. For instance, in Neapolitan Italian, the peak of a LH rising accent occurs closer to the offset of the stressed vowel when the vowel is in a closed syllable, and therefore short. Among the phonetic constraints, one finds facts about the perception of pitch and time, both for speech and for non-speech stimuli. This work investigates the role of alignment in determining tonal target perception for yes/no question and (narrow focus) statement contours in Neapolitan Italian. These contours are characterized by a melodic rise-fall, analyzed here as a sequence of a LH pitch accent plus a HL phrase tone. The separation of the rise and the fall is clear in the case of long focus constituents containing at least two words with independently stressed syllables. In more typical cases, however, this configuration is acoustically realized as a sequence of three tonal targets, LHL, due to ``merging'' of the H tone sequence in nuclear position. This study shows that the precise alignment of each of those tonal events influences the perception of the question/statement contrast. A read speech corpora, produced by two speakers of Neapolitan Italian, was first analyzed to acoustically characterize tonal targets in both yes/no questions and narrow focus statements, with target words differing in syllable structure and segmental environment. Later, a set of resynthesized stimuli was created, which constituted the basis for the perception experiments. Results show that, when tonal targets for the entire rise-fall are displaced later in time, more questions are identified. The results also suggest that F0 height has a minor role in signaling pitch accent differences, while rise and fall slope have no impact. Additionally, when the shape of the peak in the rise-fall is modified, so that a high plateau is created, more questions are perceived. This phenomenon cannot be accounted for in terms of a parsing difference between the question and the statement phonological tone structures, since those structures are the same. Moreover, the effect was also found for non-native listeners. Namely, American English listeners showed an effect of peak shape, as well as a similar use of the alignment contrast as a consequence of alignment modifications, when identifying questions vs. statements of Neapolitan. This result suggests a universal use of alignment and a psychoacoustic effect of perceived target displacement due to peak shape. Hence, despite acoustic and pragmatic differences between their rise-fall contrasts, American and Neapolitan listeners appear to employ similar perceptual strategies. The Neapolitan results for the syllable structure manipulation are difficult to interpret. While, on the one hand, the manipulation was not able to shift the crossover boundary between questions and statements, on the other hand the response curves for the open and closed syllable continua for the statement modality were significantly different. The results suggest that no look-ahead mechanism is employed when computing perceived target location. That is, question and statement tonal targets are computed relative to the left edge of the stressed syllable, so that stressed vowel duration (which is shorter in closed syllables) has no effect. A clear category boundary shift was found when stimuli were resynthesized from either a question base or a declarative base utterance. This suggests that cues other than target alignment are employed when computing perceived pitch accent contrast. In sum, this work proposes that temporal alignment, both as a production and a perception mechanism, must shape phonological systems of intonational contrast, both within and across languages.

[1]  Dj Dik Hermes Timing of pitch movements and accentuation of syllables in Dutch , 1997 .

[2]  David House Differential perception of tonal contours through the syllable , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  A. Prince,et al.  On stress and linguistic rhythm , 1977 .

[4]  Martine Grice,et al.  Leading tones and downstep in English , 1995, Phonology.

[5]  D. Ladd,et al.  Phonological conditioning of peak alignment in rising pitch accents in Dutch. , 2000, The Journal of the Acoustical Society of America.

[6]  Martine Grice,et al.  The Intonation of Interrogation in Palermo Italian: Implications for Intonation Theory , 1995 .

[7]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[8]  D. Steriade Phonetics in Phonology: The Case of Laryngeal Neutralization , 1999 .

[9]  G. Bruce Swedish word accents in sentence perspective , 1977 .

[10]  J 't Hart F0 stylization in speech: straight lines versus parabolas. , 1991, The Journal of the Acoustical Society of America.

[11]  M Studdert-Kennedy,et al.  Auditory and Linguistic Processes in the Perception of Intonation Contours , 1973, Language and speech.

[12]  M. E. H. Schouten,et al.  Searching for a time window for timbre: Dynamic spectral profiles , 1999 .

[13]  Cinzia Avesani,et al.  Towards a Strategy for ToBI labelling varieties of Italian , 2001 .

[14]  I J Hirsh,et al.  On the discrimination of frequency transitions. , 1969, The Journal of the Acoustical Society of America.

[15]  Nina Grønnum,et al.  Prosodic Parameters in a Variety of Regional Danish Standard Languages, with a View towards Swedish and German , 1990 .

[16]  Rebecca Herman Syntactically-Governed Accentuation in Balinese , 1997 .

[17]  Jo Verhoeven,et al.  The discrimination of pitch movement alignment in Dutch , 1994 .

[18]  Carlos Gussenhoven,et al.  Aligning Pitch Targets in Speech Synthesis : Effects of Syllable Structure , 1995 .

[19]  A K Nábĕlek,et al.  Perception of nonlinear and linear formant trajectories. , 1997, The Journal of the Acoustical Society of America.

[20]  D. Ladd,et al.  Stability of tonal alignment: the case of Greek prenuclear accents , 1998 .

[21]  Zinny S. Bond,et al.  Learning to identify a foreign language , 1998 .

[22]  D. Ladd The structure of intonational meaning , 1978 .

[23]  Jmb Jacques Terken,et al.  Question marking in Hungarian: timing and height of pitch peaks , 1994 .

[24]  D. Bolinger Contrastive Accent and Contrastive Stress , 1961 .

[25]  M. Studdert-Kennedy,et al.  On the role of formant transitions in vowel recognition. , 1967, The Journal of the Acoustical Society of America.

[26]  Amalia Arvaniti,et al.  On the place of phrase accents in intonational phonology , 2000, Phonology.

[27]  M. Rossi,et al.  Le seuil de glissando ou seuil de perception des variations tonales pour les sons de la parole , 1971 .

[28]  Mary E. Beckman,et al.  The Parsing of Prosody , 1996 .

[29]  Julia Hirschberg,et al.  Implicating Uncertainty: The Pragmatics of Fall-Rise Intonation , 1985 .

[30]  D. Ladd Phonological Features of Intonational Peaks , 1983 .

[31]  Julia Hirschberg,et al.  Segmental effects on timing and height of pitch contours , 1994, ICSLP.

[32]  Julia Hirschberg,et al.  Tonal alignment patterns in Spanish , 1995 .

[33]  G. Bruce,et al.  ACCENTUATION AND TIMING IN SWEDISH , 1983 .

[34]  J. T. Hart,et al.  Differential sensitivity to pitch distance, particularly in speech. , 1981 .

[35]  Amalia Arvaniti,et al.  What is a Starred Tone? Evidence from Greek , 2000 .

[36]  Daniel Hirst Tonal Units as Constituents of Prosodic Structure: The Evidence from English and French Intonation , 1988 .

[37]  Lo Duca,et al.  Rec. a: Alberto A. Sobrero (a cura di), Introduzione all'italiano contemporaneo, Vol. I Le strutture, Vol. II La variazione e gli usi, Laterza, Roma-Bari 1993. , 1994 .

[38]  Kim E. A. Silverman,et al.  The timing of prenuclear high accents in English , 1987 .

[39]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[40]  David House,et al.  Temporal alignment of accentuation boundaries in Dutch , 1997 .

[41]  Alan S. Prince,et al.  Generalized alignment , 1993 .

[42]  J. Pierrehumbert,et al.  Japanese Tone Structure , 1988 .

[43]  M. E. H. Schouten,et al.  Matching frequency glides with two steady tones , 1997 .

[44]  D B Pisoni,et al.  Some effects of laboratory training on identification and discrimination of voicing contrasts in stop consonants. , 1982, Journal of experimental psychology. Human perception and performance.

[45]  S A Jun,et al.  A Prosodic Analysis of Three Types of Wh-Phrases in Korean , 1996, Language and speech.

[46]  B. Lindblom,et al.  Interaction between duration, context, and speaking style in English stressed vowels , 1994 .

[47]  Mark Liberman,et al.  The intonational system of English , 1979 .

[48]  Zellig S. Harris,et al.  Grundzüge der Phonologie@@@Grundzuge der Phonologie , 1941 .

[49]  John Kingston,et al.  Macro and micro F0 in the synthesis of intonation , 1990 .

[50]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[51]  D. Ladd,et al.  Constant "segmental anchoring" of F0 movements under changes in speech rate. , 1999, The Journal of the Acoustical Society of America.

[52]  Matthew P. Aylett,et al.  Intonation: Theory, Models and Applications , 1997 .

[53]  D. House Tonal perception in speech , 1990 .

[54]  John Hart,et al.  A Perceptual Study of Intonation , 1990 .

[55]  K M Fire,et al.  Detection and discrimination of frequency glides as a function of direction, duration, frequency span, and center frequency. , 1997, The Journal of the Acoustical Society of America.

[56]  Bernd Pompino-Marschall,et al.  On the Psychoacoustic Nature of the P-Center Phenomenon , 1989 .

[57]  J Caspers,et al.  Effects of Time Pressure on the Phonetic Realization of the Dutch Accent-Lending Pitch Rise and Fall , 1993, Phonetica.

[58]  A. Woods,et al.  Statistics in Language Studies , 1986 .

[59]  Gösta Bruce Papers in Laboratory Phonology: Alignment and composition of tonal accents: comments on Silverman and Pierrehumberf's paper , 1990 .

[60]  G. Ayers,et al.  Guidelines for ToBI labelling , 1994 .

[61]  J. Pierrehumbert,et al.  Intonational structure in Japanese and English , 1986, Phonology.

[62]  R. Ritsma Frequencies dominant in the perception of the pitch of complex sounds. , 1966, The Journal of the Acoustical Society of America.

[63]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[64]  Martine Grice,et al.  Can pitch accent type convey information status in yes-no questions? , 1997, Workshop On Concept To Speech Generation Systems.

[65]  Mark Liberman,et al.  Synthesis by rule of english intonation patterns , 1984, ICASSP '84. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[66]  Mariapaola D’Imperio Timing differences between prenuclear and nuclear pitch accents in Italian , 1995 .

[67]  J. Nordmark,et al.  Mechanisms of frequency discrimination. , 1968, The Journal of the Acoustical Society of America.

[68]  B. Lindblom Spectrographic Study of Vowel Reduction , 1963 .

[69]  M. Studdert-Kennedy,et al.  An Experimental Study of Some Intonation Contours , 1964 .

[70]  Richard Wright,et al.  The Hyperspace Effect: Phonetic Targets Are Hyperarticulated. , 1993 .

[71]  Beckman,et al.  Phonological Structure and Phonetic Form: Articulatory evidence for differentiating stress categories , 1994 .

[72]  D. Bolinger A Theory of Pitch Accent in English , 1958 .

[73]  I J Hirsh,et al.  Pitch of tone bursts of changing frequency. , 1970, The Journal of the Acoustical Society of America.

[74]  T. M. Nearey Static, dynamic, and relational properties in vowel perception. , 1989, The Journal of the Acoustical Society of America.

[75]  B. Rochet Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech. John Kingston and Mary E. Beckman (Eds.) Cambridge: Cambridge University Press, 1990. Pp. x + 506. $69.50 cloth, $27.95 paper. , 1992, Studies in Second Language Acquisition.

[76]  M Rossi [The perception of falling glissandos in prosodic contours (author's transl)]. , 1978, Phonetica.

[77]  T. L. Face,et al.  Papers in Laboratory Phonology V: Acquisition and the Lexicon (review) , 2002 .

[78]  Carlos Gussenhoven,et al.  A semantic analysis of the nuclear tones of English , 1983 .

[79]  R. J. Ritsma Pitch discrimination and frequency discrimination , 1965 .

[80]  Janet B. Pierrehumbert,et al.  Papers in Laboratory Phonology: The timing of prenuclear high accents in English , 1990 .

[81]  Mariapaola D'Imperio,et al.  Phonetics and phonology of main stress in Italian , 1999, Phonology.

[82]  J B Pierrehumbert,et al.  Categories of tonal alignment in English. , 1989, Phonetica.

[83]  Shirley A. Steele Nuclear accent F0 peak location: Effects of rate, vowel, and number of following syllables , 1986 .

[84]  Mariapaola D'Imperio,et al.  Perception of questions and statements in Neapolitan Italian , 1997, EUROSPEECH.

[85]  Mariapaola D'Imperio,et al.  Papers from the Linguistics Laboratory. Working Papers in Linguistics, No. 50. , 1997 .

[86]  Cinzia Avesani,et al.  A contribution to the synthesis of Italian intonation , 1990, ICSLP.

[87]  G. E. Peterson,et al.  Duration of Syllable Nuclei in English , 1960 .