A survey on recent progress in the ASAT/SIRKUS paradigm

Automatic Speech Attribute Transcription (ASAT), an ITR project sponsored under the NSF grant (IIS-04-27113), and Spoken Information Retrieval by Knowledge Utilization in Statistical Speech Processing (SIRKUS), a project funded by the VERDIKT programme at the Research Council of Norway, are two research projects carried out at Georgia Institute of Technology and at Norwegian University of Science and Technology, respectively, with the purpose of investigating and developing new paradigms for speech recognition that have the capability of bridging the gap between machine and human performance. These projects approach speech recognition from a more linguistic perspective: unlike traditional ASR systems, humans detect acoustic and auditory cues, weigh and combine them to form theories, and then process these cognitive hypotheses until linguistically and pragmatically consistent speech understanding is achieved. A major goal of the ASAT/SIRKUS paradigms is to develop a detection-based approach to automatic speech recognition (ASR) based on attribute detection and knowledge integration. We report on progress of these two projects on two different tasks, namely the cross-language and language universal attribute/phone recognition task, and the language identification (LID) task.

[1]  Chin-Hui Lee,et al.  Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition , 2010, INTERSPEECH.

[2]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[3]  Chin-Hui Lee,et al.  Exploring universal attribute characterization of spoken languages for spoken language recognition , 2009, INTERSPEECH.

[4]  Chin-Hui Lee,et al.  Towards bottom-up continuous phone recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  C.-H. Lee,et al.  From knowledge-ignorant to knowledge-rich modeling : a new speech research parading for next generation automatic speech recognition , 2004 .

[6]  Biing-Hwang Juang,et al.  An overview on automatic speech attribute transcription (ASAT) , 2007, INTERSPEECH.

[7]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Dau-Cheng Lyu,et al.  Continuous phone recognition without target language training data , 2008, INTERSPEECH.

[9]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[11]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[12]  Victor Zue,et al.  Automatic language identification using a segment-based approach , 1993, EUROSPEECH.

[13]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[14]  S.E. Levinson,et al.  Structural methods in automatic speech recognition , 1985, Proceedings of the IEEE.

[15]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[16]  Joachim Köhler,et al.  Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  Chin-Hui Lee,et al.  A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition , 2009, Speech Commun..

[18]  J. Kohler Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.