Bypassing Words in Automatic Speech Recognition

Automatic speech recognition (ASR) is usually defined as the transformation of an acoustic signal to words. Though there are cases where the transformation to words is useful, the definition does not exhaust all contexts in which ASR could be used. Once the constraint that an ASR system outputs words is relaxed, modifications that reduce the search space become possible: 1) The use of syllables instead of words in the recognizer’s language model; 2) The addition of a concept model that transforms syllable strings to concept strings, where a concept collects related words and phrases. The paper presents preliminary positive results on the use of syllables and concepts in speech recognition and outlines our current efforts to verify the Syllable-Concept Hypothesis (SCH).

[1]  Dong Yu,et al.  An Integrative and Discriminative Technique for Spoken Utterance Classification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Steven Greenberg,et al.  Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation , 1999, Speech Commun..

[3]  Steven Greenberg,et al.  From here to utility - melding phonetic insight with speech technology , 2001, INTERSPEECH.

[4]  Wayne H. Ward,et al.  Dialog-context dependent language modeling combining n-grams and stochastic context-free grammars , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  篠田 浩一 私のすすめるこの一冊 ; Spoken Launguage Processing: A Guide to Theory, Algorithm, and System Development, Xuedong Huang, Alex Acero and Hsiao-Wuen Hon, Prentice Hall, 2001 年 , 2003 .

[6]  De Palma,et al.  Syllables and Concepts in Large Vocabulary Speech Recognition , 2010 .

[7]  M. Tomasello The new psychology of language : cognitive and functional approaches to language structure , 1998 .

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  Simon King,et al.  IEEE Workshop on automatic speech recognition and understanding , 2009 .

[10]  Steven Greenberg,et al.  Beyond the phoneme: a juncture-accent model of spoken language , 2002 .

[11]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[12]  Robert I. Damper,et al.  Evaluating automatic syllabification algorithms for English , 2007, SSW.

[13]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[14]  Hans Uszkoreit Proceedings of the 33rd annual meeting on Association for Computational Linguistics , 1995 .

[15]  Joerg P. Ueberla,et al.  Analyzing and Improving Statistical Language Models for Speech Recognition , 1994, ArXiv.

[16]  Chin-Hui Lee,et al.  Stochastic Representation of Conceptual Structure in the ATIS Task , 1991, HLT.

[17]  Martha Larson,et al.  Syllable-based Language Models in Speech Recognition for English Spoken Document Retrieval , 2005 .

[18]  Shrikanth S. Narayanan,et al.  Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Sarel van Vuuren,et al.  Syllable lattices as a basis for a children's speech reading tracker , 2007, INTERSPEECH.

[20]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[21]  Louis ten Bosch,et al.  On the Utility of Syllable-Based Acoustic Models for Pronunciation Variation Modelling , 2007, EURASIP J. Audio Speech Music. Process..

[22]  Vaibhava Goel,et al.  Syllable-a promising recognition unit for LVCSR , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[23]  James Bamford,et al.  The shadow factory : the ultra-secret NSA from 9/11 to the eavesdropping on America , 2008 .

[24]  Joseph Picone,et al.  Syllable-based large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..