Toward a model for lexical access based on acoustic landmarks and distinctive features.

This article describes a model in which the acoustic speech signal is processed to yield a discrete representation of the speech stream in terms of a sequence of segments, each of which is described by a set (or bundle) of binary distinctive features. These distinctive features specify the phonemic contrasts that are used in the language, such that a change in the value of a feature can potentially generate a new word. This model is a part of a more general model that derives a word sequence from this feature representation, the words being represented in a lexicon by sequences of feature bundles. The processing of the signal proceeds in three steps: (1) Detection of peaks, valleys, and discontinuities in particular frequency ranges of the signal leads to identification of acoustic landmarks. The type of landmark provides evidence for a subset of distinctive features called articulator-free features (e.g., [vowel], [consonant], [continuant]). (2) Acoustic parameters are derived from the signal near the landmarks to provide evidence for the actions of particular articulators, and acoustic cues are extracted by sampling selected attributes of these parameters in these regions. The selection of cues that are extracted depends on the type of landmark and on the environment in which it occurs. (3) The cues obtained in step (2) are combined, taking context into account, to provide estimates of "articulator-bound" features associated with each landmark (e.g., [lips], [high], [nasal]). These articulator-bound features, combined with the articulator-free features in (1), constitute the sequence of feature bundles that forms the output of the model. Examples of cues that are used, and justification for this selection, are given, as well as examples of the process of inferring the underlying features for a segment when there is variability in the signal due to enhancement gestures (recruited by a speaker to make a contrast more salient) or due to overlap of gestures from neighboring segments.

[1]  D. Gow Does English coronal place assimilation create lexical ambiguity , 2002 .

[2]  K. Stevens,et al.  Effect of burst amplitude on the perception of stop consonant place of articulation. , 1983, The Journal of the Acoustical Society of America.

[3]  K. Hale,et al.  Ken Hale: A Life in Language , 2001 .

[4]  Dennis H. Klatt,et al.  Speech perception: a model of acoustic–phonetic analysis and lexical access , 1979 .

[5]  B. Lindblom,et al.  Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast , 1972 .

[6]  L. Boves,et al.  On subglottal formant analysis. , 1987, The Journal of the Acoustical Society of America.

[7]  Marilyn Y. Chen Nasal detection module for a knowledge-based speech recognition system , 2000, INTERSPEECH.

[8]  Sharlene A. Liu,et al.  Landmark detection for distinctive feature-based speech recognition , 1996 .

[9]  J. McCarthy Feature Geometry and Dependency: A Review , 1988 .

[10]  R. Krakow NONSEGMENTAL INFLUENCES ON VELUM MOVEMENT PATTERNS: SYLLABLES, SENTENCES, STRESS, AND SPEAKING RATE , 1993 .

[11]  Kenneth N. Stevens,et al.  On the quantal nature of speech , 1972 .

[12]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[13]  K. Stevens,et al.  A Note on Laryngeal Features , 2003 .

[14]  D. Gow Assimilation and Anticipation in Continuous Spoken Word Recognition , 2001 .

[15]  David W. Gow,et al.  How word onsets drive lexical access and segmentation: evidence from acoustics, phonology and processing , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Marilyn Y. Chen,et al.  Acoustic correlates of English and French nasalized vowels. , 1997, The Journal of the Acoustical Society of America.

[17]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[18]  Kenneth N. Stevens,et al.  Diverse Acoustic Cues at Consonantal Landmarks , 2000, Phonetica.

[19]  K. Kohler Phonetic Explanation in Phonology: The Feature Fortis/Lenis , 1984, Phonetica.

[20]  Keith Johnson,et al.  The Role of Speech Perception in Phonology , 2001 .

[21]  Anne Cutler,et al.  The predominance of strong initial syllables in the English vocabulary , 1987 .

[22]  Jeung-Yoon Choi,et al.  Detection of consonant voicing: a module for a hierarchical speech recognition system , 1999 .

[23]  James R. Glass Finding acoustic regularities in speech: applications to phonetic recognition , 1988 .

[24]  Walter Sun Analysis and interpretation of glide characteristics in pursuit of an algorithm for recognition , 1996 .

[25]  James Glass,et al.  The SUMMIT speech recognition system: phonological modelling and lexical access , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[26]  Björn Lindblom,et al.  Explaining Phonetic Variation: A Sketch of the H&H Theory , 1990 .

[27]  R L Diehl,et al.  The Role of Phonetics within the Study of Language , 1991, Phonetica.

[28]  S. Shattuck-Hufnagel The role of word structure in segmental serial ordering , 1992, Cognition.

[29]  K. Stevens,et al.  Knowledge of language and the sounds of speech , 1991 .

[30]  Sharon Y. Manuel,et al.  Speakers nasalize /∂/ after /n/, but listeners still hear /∂/ , 1995 .

[31]  L. Lisker,et al.  A Cross-Language Study of Voicing in Initial Stops: Acoustical Measurements , 1964 .

[32]  Osamu Fujimura,et al.  Allophonic variation in English /l/ and its implications for phonetic implementation , 1993 .

[33]  Elizabeth C. Zsiga Acoustic evidence for gestural overlap in consonant sequences , 1994 .

[34]  D. Whalen,et al.  Cricothyroid activity in high and low vowels: exploring the automaticity of intrinsic F0 , 1999 .

[35]  Taehong Cho,et al.  Variation and universals in VOT: evidence from 18 languages , 1999 .

[36]  K. Stevens,et al.  Feature geometry and the vocal tract , 1994, Phonology.

[37]  George N. Clements,et al.  The geometry of phonological features , 1985, Phonology Yearbook.

[38]  A. House,et al.  The Influence of Consonant Environment upon the Secondary Acoustical Characteristics of Vowels , 1953 .

[39]  H. Sussman,et al.  An investigation of locus equations as a source of relational invariance for stop place categorization , 1991 .

[40]  John Kingston,et al.  Between the grammar and physics of speech , 1994 .

[41]  Willem J. M. Levelt,et al.  A theory of lexical access in speech production , 1999, Behavioral and Brain Sciences.

[42]  H. S. Gopal,et al.  A perceptual model of vowel recognition based on the auditory representation of American English vowels. , 1986, The Journal of the Acoustical Society of America.

[43]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[44]  V. Zue,et al.  Acoustic study of medial /t,d/ in American English , 1979 .

[45]  Osamu Fujimura,et al.  Nasalization of Vowels in Relation to Nasals , 1958 .

[46]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[47]  Kenneth N. Stevens,et al.  Automatic syllable detection for vowel landmarks , 2000 .

[48]  L. Chistovich,et al.  The ‘center of gravity’ effect in vowel spectra and critical distance between the formants: Psychoacoustical study of the perception of vowel-like stimuli , 1979, Hearing Research.

[49]  P. Ladefoged WHAT ARE LINGUISTIC SOUNDS MADE OF , 1980 .

[50]  D. Kewley-Port Measurement of formant transitions in naturally produced stop consonant-vowel syllables. , 1982, The Journal of the Acoustical Society of America.

[51]  Kenneth N. Stevens,et al.  Modeling stop‐consonant releases for synthesis , 2000 .