A novel Gaussianized vector representation for natural scene categorization

This paper presents a novel Gaussianized vector representation for scene images by an unsupervised approach. First, each image is encoded as an ensemble of orderless bag of features, and then a global Gaussian Mixture Model (GMM) learned from all images is used to randomly distribute each feature into one Gaussian component by a multinomial trial. The parameters of the multinomial distribution are defined by the posteriors of the feature on all the Gaussian components. Finally, the normalized means of the features distributed in every Gaussian component are concatenated to form a supervector, which is a compact representation for each scene image. We prove that these super-vectors observe the standard normal distribution. Our experiments on scene categorization tasks using this vector representation show significantly improved performance compared with the bag-of-features representation.

[1]  Mark Johnson,et al.  Multi-vector pitch-orthogonal LPC: quality speech with low complexity at rates between 4 and 8 kbps , 1990, ICSLP.

[2]  James W. Beauchamp,et al.  Acoustics, Audio, and Music Technology Education at the University of Illinois at Urbana‐Champaign , 2001 .

[3]  Mark Hasegawa-Johnson,et al.  Analysis of the three-dimensional tongue shape using a three-index factor analysis model. , 2003, The Journal of the Acoustical Society of America.

[4]  Jennifer Cole,et al.  Sets for the Automatic Detection of Prosodic Prominence , 2010 .

[5]  Thomas S. Huang,et al.  Emotion recognition from speech VIA boosted Gaussian mixture models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[6]  Mark Hasegawa-Johnson,et al.  A Maximum Likelihood Prosody Recognizer , 2004 .

[7]  Mark Hasegawa-Johnson,et al.  Optimal speech estimator considering room response as well as additive noise: Different approaches in low and high frequency range , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  M. Hasegawa-Johnson,et al.  Bayesian learning for models of human speech perception , 2004, IEEE Workshop on Statistical Signal Processing, 2003.

[9]  CTMRedit : A Case Study in Human-Computer Interface Design , 1999 .

[10]  Using Web Mining Techniques to Build a Multi-dialect Lexicon of Arabic , .

[11]  Mark Hasegawa-Johnson,et al.  Landmark-based automated pronunciation error detection , 2010, INTERSPEECH.

[12]  Mark Hasegawa-Johnson,et al.  PLP coefficients can be quantized at 400 bps , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  Mark Hasegawa-Johnson,et al.  Frequency of consonant articulation errors in dysarthric speech , 2010, Clinical linguistics & phonetics.

[14]  Mark Hasegawa-Johnson,et al.  How do ordinary listeners perceive prosodic prominence? Syntagmatic versus paradigmatic comparison. , 2009 .

[15]  M. Johnson,et al.  Pitch sharpening for perceptually improved CELP, and the sparse-delta codebook for reduced computation , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Stephen E. Levinson,et al.  1 An Empathic-tutoring System Using Spoken Language , 2007 .

[17]  Jui Ting Huang,et al.  Unsupervised Prosodic Break Detection in Mandarin Speech , 2008 .

[18]  Abeer Alwan,et al.  Speech Coding: Fundamentals and Applications , 2003 .

[19]  Mark Hasegawa-Johnson,et al.  Approximately independent factors of speech using nonlinear symplectic transformation , 2003, IEEE Trans. Speech Audio Process..

[20]  Yanli Zheng PARAFAC analysis of the three dimensional tongue shape , 2007 .

[21]  Po-Sen Huang,et al.  Prosody-dependent acoustic modeling using variable-parameter hidden markov models , 2010 .

[22]  Mark Hasegawa-Johnson,et al.  Novel entropy based moving average refiners for HMM landmarks , 2006, INTERSPEECH.

[23]  Jeung-Yoon Choi,et al.  Finding intonational boundaries using acoustic cues related to the voice source. , 2005, The Journal of the Acoustical Society of America.

[24]  Mark Hasegawa-Johnson,et al.  How Unlabeled Data Change the Acoustic Models For Phonetic Classification , 2010 .

[25]  Mark Hasegawa-Johnson,et al.  FSM-based pronunciation modeling using articulatory phonological code , 2010, INTERSPEECH.

[26]  Thomas S. Huang,et al.  Kernel metric learning for phonetic classification , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[27]  Jeung-Yoon Choi,et al.  Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus , 2005, Speech Commun..

[28]  Mark Hasegawa-Johnson,et al.  Acoustic Differentiation of ip and IP Boundary Levels: Comparison of L- and L-L% in the Switchboard Corpus , 2004 .

[29]  Ken Chen,et al.  Speech Recognition Models of the Interdependence Among Syntax, Prosody, and Segmental Acoustics , 2004, HLT-NAACL 2004.

[30]  Automated Pronunciation Scoring for L2 English Learners , 2008 .

[31]  Mark Hasegawa-Johnson,et al.  ON THE EDGE: ACOUSTIC CUES TO LAYERED PROSODIC DOMAINS , 2007 .

[32]  Yun Fu,et al.  Lipreading by Locality Discriminant Graph , 2007, 2007 IEEE International Conference on Image Processing.

[33]  Mark Hasegawa-Johnson,et al.  Distinctive feature based SVM discriminant features for improvements to phone recognition on telephone band speech , 2005, INTERSPEECH.

[34]  Stephen E. Levinson,et al.  A Hybrid Model for Spontaneous Speech Understanding , 2005 .

[35]  M. Hasegawa-Johnson,et al.  CTMRedit: a Matlab-based tool for segmenting and interpolating MRI and CT images in three orthogonal planes , 1999, Proceedings of the First Joint BMES/EMBS Conference. 1999 IEEE Engineering in Medicine and Biology 21st Annual Conference and the 1999 Annual Fall Meeting of the Biomedical Engineering Society (Cat. N.

[36]  Mark Hasegawa-Johnson,et al.  Multivariate-state hidden Markov models for simultaneous transcription of phones and formants , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[37]  Zhihong Zeng,et al.  Sensitive Talking Heads , 2009 .

[38]  Abeer Alwan,et al.  Vowel category dependence of the relationship between palate height, tongue height, and oral area. , 2003, Journal of speech, language, and hearing research : JSLHR.

[39]  Ming Liu,et al.  Robust Analysis and Weighting on MFCC Components for Speech Recognition and Speaker Identification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[40]  M. Hasegawa-Johnson,et al.  Particle filtering approach to Bayesian formant tracking , 2003, IEEE Workshop on Statistical Signal Processing, 2003.

[41]  Thomas S. Huang,et al.  Real-world acoustic event detection , 2010, Pattern Recognit. Lett..

[42]  Mark Hasegawa-Johnson,et al.  Prosodic effects on acoustic cues to stop voicing and place of articulation: Evidence from Radio News speech , 2007, J. Phonetics.

[43]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[44]  Mark Hasegawa-Johnson,et al.  Prosodic effects on temporal structure of monosyllabic CVC words in American English , 2010 .

[45]  Tomohiko Taniguchi,et al.  Speech coding system having codebook storing differential vectors between each two adjoining code vectors , 1995 .

[46]  Wen Gao,et al.  An improved active shape model for face alignment , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[47]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[48]  Mark Hasegawa-Johnson,et al.  NON-LINEAR INDEPENDENT COMPONENT ANALYSIS FOR SPEECH RECOGNITION , 2003 .

[49]  Mark Hasegawa-Johnson,et al.  Automated pronunciation scoring using confidence scoring and landmark-based SVM , 2009, INTERSPEECH.

[50]  Thomas S. Huang,et al.  Novel Gaussianized vector representation for improved natural scene categorization , 2010, Pattern Recognit. Lett..

[51]  Mark Hasegawa-Johnson,et al.  A Factored Language Model for Prosody Dependent Speech Recognition , 2007 .

[52]  Stephen E. Levinson,et al.  Children's emotion recognition in an intelligent tutoring scenario , 2004, INTERSPEECH.

[53]  M. Johnson,et al.  Low-complexity multi-mode VXC using multi-stage optimization and mode selection (speech coding) , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[54]  Mark Hasegawa-Johnson,et al.  Acoustic fall detection using Gaussian mixture models and GMM supervectors , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[55]  Thomas S. Huang,et al.  A Novel Vector Representation of Stochastic Signals Based on Adapted Ergodic HMMs , 2010, IEEE Signal Processing Letters.

[56]  Mark Hasegawa-Johnson,et al.  Signal-based and expectation-based factors in the perception of prosodic prominence , 2010 .

[57]  Mark Hasegawa-Johnson,et al.  The effect of accent on acoustic cues to stop voicing and place of articulation in radio news speech , 2004, Speech Prosody 2004.

[58]  Ken Chen,et al.  An evaluation of using mutual information for selection of acoustic-features representation of phonemes for speech recognition , 2002, INTERSPEECH.

[59]  Mark Hasegawa-Johnson,et al.  Towards Interpretation of Creakiness in Switchboard , 2008 .

[60]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[61]  Jennifer Cole,et al.  Speaker-Independent Automatic Detection of Pitch Accent , 2004 .

[62]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[63]  Bowon Lee,et al.  A PHONEMIC RESTORATION APPROACH FOR AUTOMATIC SPEECH RECOGNITION WITH HIGHLY NONSTATIONARY BACKGROUND NOISE , 2022 .

[64]  Mark Johnson,et al.  Automatic context-sensitive measurement of the acoustic correlates of distinctive features at landmarks , 1994, ICSLP.

[65]  Mark Hasegawa-Johnson,et al.  Detecting Non-modal Phonation in Telephone Speech , 2008 .

[66]  Mark Hasegawa-Johnson,et al.  Intertranscriber reliability of prosodic labeling on telephone conversation using toBI , 2004, INTERSPEECH.

[67]  Bernt Schiele,et al.  A Semantic Typicality Measure for Natural Scene Categorization , 2004, DAGM-Symposium.

[68]  Mark Hasegawa-Johnson,et al.  Information theory and variance estimation techniques in the analysis of category rating and paired comparisons , 1997 .

[69]  Mark Hasegawa-Johnson,et al.  PROSODY AS A CONDITIONING VARIABLE IN SPEECH RECOGNITION , 2003 .

[70]  Mark Hasegawa-Johnson,et al.  Stop consonant classification by dynamic formant trajectory , 2004, INTERSPEECH.

[71]  Mark Hasegawa-Johnson,et al.  Frequency and repetition effects outweigh phonetic detail in prominence perception , 2008 .

[72]  Mark Hasegawa-Johnson,et al.  Formant tracking by mixture state particle filter , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[73]  Mark Hasegawa-Johnson,et al.  Generalized Optimal Multi-Microphone Speech Enhancement Using Sequential Minimum Variance Distortionless Response(MVDR) Beamforming and Postfiltering , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[74]  Trevor Darrell,et al.  Pyramid Match Kernels: Discriminative Classification with Sets of Image Features (version 2) , 2006 .

[75]  Mark Hasegawa-Johnson,et al.  Source separation using particle filters , 2004, INTERSPEECH.

[76]  M. Johnson,et al.  Pitch-orthogonal code-excited LPC , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[77]  Yun Fu,et al.  Humanoid Audio–Visual Avatar With Emotive Text-to-Speech Synthesis , 2008, IEEE Transactions on Multimedia.

[78]  M. Hasegawa-Johnson,et al.  Acoustic Cues to Lexical Stress in Spastic Dysarthria , 2009 .

[79]  Mark Hasegawa-Johnson,et al.  Joint estimation of DOA and speech based on EM beamforming , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[80]  Mark Hasegawa-Johnson,et al.  Semi-supervised training of Gaussian mixture models by conditional entropy minimization , 2010, INTERSPEECH.

[81]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[82]  Mark Hasegawa-Johnson Time-frequency distribution of partial phonetic information measured using mutual information , 2000, INTERSPEECH.

[83]  Lae-Hoon Kim,et al.  Toward overcoming fundamental limitation in frequency-domain blind source separation for reverberant speech mixtures , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[84]  Mark Hasegawa-Johnson,et al.  Maximum mutual information estimation with unlabeled data for phonetic classification , 2008, INTERSPEECH.

[85]  Karen Livescu Articulatory Feature-based Methods for Acoustic and Audio-Visual Speech Recognition : 2006 JHU Summer Workshop Final Report 1 , 2007 .

[86]  Stephen E. Levinson,et al.  Automatic detection of contrast for speech understanding , 2004, INTERSPEECH.

[87]  Mark Hasegawa-Johnson,et al.  Voice Quality Dependent Speech Recognition , 2009 .

[88]  Mark A. Hasegawa-Johnson,et al.  Brain anatomy differences in childhood stuttering , 2008, NeuroImage.

[89]  Mark Hasegawa-Johnson,et al.  Robust automatic speech recognition with decoder oriented ideal binary mask estimation , 2010, INTERSPEECH.

[90]  Mark Hasegawa-Johnson,et al.  The entropy of the articulatory phonological code: recognizing gestures from tract variables , 2008, INTERSPEECH.

[91]  Mark Hasegawa-Johnson,et al.  Acoustic differentiation of L- and L-L% in switchboard and radio news speech , 2006 .

[92]  Mark Hasegawa-Johnson,et al.  Novel time domain multi-class SVMs for landmark detection , 2006, INTERSPEECH.

[93]  Mark Hasegawa-Johnson,et al.  Modeling and recognition of phonetic and prosodic factors for improvements to acoustic speech recognition models , 2004, INTERSPEECH.

[94]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[95]  Yuxiao Hu,et al.  Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[96]  Harry Shum,et al.  Face alignment using statistical models and wavelet features , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[97]  Simon King,et al.  Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[98]  Stephen E. Levinson,et al.  Semantic analysis for a speech user interface in an intelligent tutoring system , 2004, IUI '04.

[99]  Mark Hasegawa-Johnson,et al.  Modeling pronunciation variation using artificial neural networks for English spontaneous speech , 2004, INTERSPEECH.

[100]  Mark Hasegawa-Johnson,et al.  Generalized multi-microphone spectral amplitude estimation based on correlated noise model , 2007 .

[101]  Stephen E. Levinson,et al.  Extraction of pragmatic and semantic salience from spontaneous spoken English , 2006, Speech Commun..

[102]  Ming Liu,et al.  AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.

[103]  Mark Hasegawa-Johnson,et al.  Speech enhancement beyond minimum mean squared error with perceptual noise shaping. , 2010 .

[104]  Kate Saenko,et al.  AUDIOVISUAL SPEECH RECOGNITION WITH ARTICULATOR POSITIONS AS HIDDEN VARIABLES , 2007 .

[105]  Jeung-Yoon Choi,et al.  Prosody dependent speech recognition on radio news corpus of American English , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[106]  Mark Hasegawa-Johnson,et al.  A novel algorithm for sparse classification. , 2010 .

[107]  Mark Johnson,et al.  Online and offline computational reduction techniques using backward filtering in CELP speech coders , 1992, IEEE Trans. Signal Process..

[108]  Frank K. Soong,et al.  A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion , 2010, INTERSPEECH.

[109]  Xiaodan Zhuang,et al.  Efficient object localization with gaussianized vector representation , 2009, IMCE '09.

[110]  Mark Hasegawa-Johnson,et al.  Acoustic model for robustness analysis of optimal multipoint room equalization. , 2008, The Journal of the Acoustical Society of America.

[111]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[112]  Mark Hasegawa-Johnson,et al.  Human speech perception and feature extraction , 2008, INTERSPEECH.

[113]  Shuicheng Yan,et al.  SIFT-Bag kernel for video event analysis , 2008, ACM Multimedia.

[114]  Arthur Kantor,et al.  Stream weight tuning in dynamic Bayesian networks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[115]  Ming Liu,et al.  HMM-Based Acoustic Event Detection with AdaBoost Feature Selection , 2007, CLEAR.

[116]  Mark Hasegawa-Johnson,et al.  Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[117]  HMM-based Pronunciation Dictionary Generation , 2010 .

[118]  Mark Hasegawa-Johnson,et al.  On Semi-Supervised Learning of Gaussian Mixture Models for Phonetic Classification , 2009, HLT-NAACL 2009.

[119]  Mark Hasegawa-Johnson,et al.  Auditory-modeling inspired methods of feature extraction for robust automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[120]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[121]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[122]  Thomas S. Huang,et al.  Non-frontal view facial expression recognition based on ergodic hidden Markov model supervectors , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[123]  Mark Hasegawa-Johnson,et al.  Automatic recognition of pitch movements using multilayer perceptron and time-Delay Recursive neural network , 2004, IEEE Signal Processing Letters.

[124]  Bowon Lee MINIMUM MEAN-SQUARED ERROR A POSTERIORI ESTIMATION OF HIGH VARIANCE VEHICULAR NOISE , .

[125]  Mark Hasegawa-Johnson,et al.  A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[126]  Marvin Johnson A mapping between trainable generalized properties and the acoustic correlates of distinctive features , 1993 .

[127]  Stephen E. Levinson,et al.  Cognitive state classification in a spoken tutorial dialogue system , 2006, Speech Commun..

[128]  Ming Liu,et al.  Exploring Discriminative Learning for Text-Independent Speaker Recognition , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[129]  Marvin Johnson Using beam elements to model the vocal fold length in breathy voicing. , 1992 .

[130]  Mark Hasegawa-Johnson,et al.  Model enforcement: a unified feature transformation framework for classification and recognition , 2004, IEEE Transactions on Signal Processing.

[131]  Mark Hasegawa-Johnson,et al.  A factorial HMM aproach to robust isolated digit recognition in background music , 2004, INTERSPEECH.

[132]  Ming Liu,et al.  Regression from patch-kernel , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[133]  M. Hasegawa-Johnson,et al.  Automatic Fluency Assessment by Signal-Level Measurement of Spontaneous Speech , 2010 .

[134]  Mark Hasegawa-Johnson Burst spectral measures and formant frequencies can be used to accurately discriminate place of articulation , 1995 .

[135]  M. Hasegawa-Johnson,et al.  Gaussian mixture models of phonetic boundaries for speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[136]  Mark Hasegawa-Johnson,et al.  A procedure for estimating gestural scores from natural speech , 2010, INTERSPEECH.

[137]  Mark Hasegawa-Johnson,et al.  Acoustic correlates of non‐modal phonation in telephone speech , 2005 .

[138]  Mark Hasegawa-Johnson,et al.  Maximum mutual information based acoustic-features representation of phonological features for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[139]  Stephen E. Levinson,et al.  MENTAL STATE DETECTION OF DIALOGUE SYSTEM USERS VIA SPOKEN LANGUAGE , 2003 .

[140]  M. Hasegawa-Johnson,et al.  Electromagnetic exposure safety of the Carstens articulograph AG100. , 1998, The Journal of the Acoustical Society of America.

[141]  Mark Hasegawa-Johnson,et al.  Acoustic segmentation using switching state Kalman filter , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[142]  Thomas S. Huang,et al.  Two-stage prosody prediction for emotional text-to-speech synthesis , 2008, INTERSPEECH.

[143]  Mark Hasegawa-Johnson,et al.  A procedure for estimating gestural scores from articulatory data. , 2010 .

[144]  Mark Hasegawa-Johnson,et al.  How Prosody Improves Word Recognition , 2004 .

[145]  Mark Hasegawa-Johnson,et al.  Prosodic Hierarchy as an Organizing Framework for the Sources of Context in Phone-Based and Articulatory-Feature-Based Speech Recognition , 2009 .

[146]  M. Hasegawa-Johnson,et al.  The effect of accent on the acoustic cues to stop voicing in Radio News speech , 2003 .

[147]  Thomas S. Huang,et al.  Face age estimation using patch-based hidden Markov model supervectors , 2008, 2008 19th International Conference on Pattern Recognition.

[148]  Mark Hasegawa-Johnson,et al.  Prosodic effects on vowel production: evidence from formant structure , 2009, INTERSPEECH.

[149]  Mark Hasegawa-Johnson,et al.  Kinematic analysis of tongue movement control in spastic dysarthria , 2010, INTERSPEECH.

[150]  Mark Hasegawa-Johnson,et al.  Prosody dependent speech recognition with explicit duration modelling at intonational phrase boundaries , 2003, INTERSPEECH.

[151]  Thomas S. Huang,et al.  Hmm-Based and Svm-Based Recognition of the Speech of Talkers With Spastic Dysarthria , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[152]  Yun Fu,et al.  EAVA: A 3D Emotive Audio-Visual Avatar , 2008, 2008 IEEE Workshop on Applications of Computer Vision.

[153]  Thomas S. Huang,et al.  Feature analysis and selection for acoustic event detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[154]  David Harwath Phonetic Landmark Detection for Automatic Language Identification , 2010 .

[155]  Lin Yang,et al.  E-coder for Automatic Scoring Physical Activity Diary Data: Development and Validation , 2007 .

[156]  Mark Hasegawa-Johnson,et al.  Maximum conditional mutual information projection for speech recognition , 2003, INTERSPEECH.

[157]  Ming Liu,et al.  Frequency domain correspondence for speaker normalization , 2007, INTERSPEECH.

[158]  M. Johnson,et al.  Improving the performance of CELP-based speech coding at low bit rates , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[159]  Thomas S. Huang,et al.  Toward robust learning of the Gaussian mixture state emission densities for hidden Markov models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[160]  Mark Hasegawa-Johnson,et al.  State-Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition , 2010, SLPAT@NAACL.

[161]  Mark Hasegawa-Johnson,et al.  Non-linear maximum likelihood feature transformation for speech recognition , 2003, INTERSPEECH.

[162]  Mark Hasegawa-Johnson,et al.  An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[163]  Mark Hasegawa-Johnson,et al.  Universal access: speech recognition for talkers with spastic dysarthria , 2009, INTERSPEECH.

[164]  Mark Hasegawa-Johnson A Multi-Stream Approach to Audiovisual Automatic Speech Recognition , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[165]  M. Hasegawa-Johnson,et al.  Strong-sense class-dependent features for statistical recognition , 2003, IEEE Workshop on Statistical Signal Processing, 2003.