论文信息 - Supra-Segmental Feature Based Speaker Trait Detection

Supra-Segmental Feature Based Speaker Trait Detection

It is well known that speech utterances convey a rich diversity of information concerning the speaker in addition to related semantic content. Such information may contain speaker traits such as personality, likability, health/pathology, etc. To detect speaker traits in human computer interface is an important task toward formulating more efficient and natural computer engagement. This study proposes two groups of supra-segmental features for improving speaker trait detection performance. Compared with the 6125 dimension features based baseline system, the proposed supra-segmental system not only improves performance by 9.0%, but also is computationally attractive and proper for real life application since it derives a less than 63 dimension features, which are 99% less than the baseline system.

John H. L. Hansen | Gang Liu

[1] Hugo Van hamme,et al. Age Estimation from Telephone Speech using i-vectors , 2012, INTERSPEECH.

[2] Gang Liu,et al. Robust speech enhancement techniques for ASR in non-stationary noise and dynamic environments , 2013, INTERSPEECH.

[3] John H. L. Hansen,et al. A fast speaker verification with universal background support data selection , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Naveen Kumar,et al. Intelligibility classification of pathological speech using fusion of multiple high level descriptors , 2012, INTERSPEECH.

[5] John H. L. Hansen,et al. Exploring Hilbert envelope based acoustic features in i-vector speaker verification using HT-PLDA , 2011 .

[6] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7] John H. L. Hansen,et al. Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[8] John H. L. Hansen,et al. Classification of speech under stress using target driven features , 1996, Speech Commun..

[9] John H. L. Hansen,et al. CRSS systems for 2012 NIST Speaker Recognition Evaluation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Shrikanth Narayanan,et al. Feature analysis for automatic detection of pathological speech , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[11] John H. L. Hansen,et al. A linguistic data acquisition front-end for language recognition evaluation , 2012, Odyssey.

[12] Marilyn A. Walker,et al. Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[13] Tin Kam Ho,et al. The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14] Felix Burkhardt,et al. A Database of Age and Gender Annotated Telephone Speech , 2010, LREC.

[15] John H. L. Hansen,et al. Automatic regularization of cross-entropy cost for speaker recognition fusion , 2013, INTERSPEECH.

[16] Yun Lei,et al. A novel feature extraction strategy for multi-stream robust emotion identification , 2010, INTERSPEECH.

[17] Elmar Nöth,et al. The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[18] John H. L. Hansen,et al. An investigation on back-end for speaker recognition in multi-session enrollment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19] Eliathamby Ambikairajah,et al. Language Identification using Warping and the Shifted Delta Cepstrum , 2005, 2005 IEEE 7th Workshop on Multimedia Signal Processing.

[20] Björn W. Schuller,et al. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[21] Richard M. Stern,et al. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22] Julien Epps,et al. A Comparison of Classification Paradigms for Features for Speaker Likeability Determination , 2012 .

[23] John H. L. Hansen,et al. VOICE ANALYSIS IN ADVERSE CONDITIONS: THE CENTENNIAL OLYMPIC PARK BOMBING 911 CALL , 1999 .

[24] James H. Elder,et al. Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25] John H. L. Hansen,et al. I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.

[26] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[27] Dimitra Vergyri,et al. Multi-System Fusion of Extended Context Prosodic and Cepstral Features for Paralinguistic Speaker Trait Classification , 2012, INTERSPEECH.

[28] Rongshan Yu,et al. Detecting Intelligibility by Linear Dimensionality Reduction and Normalized Voice Quality Hierarchical Features , 2012, INTERSPEECH.

[29] John H. L. Hansen,et al. Detecting Sleepiness by Fusing Classifiers Trained with Novel Acoustic Features , 2011, INTERSPEECH.

[30] Fei Sha,et al. Predicting Likability of Speakers with Gaussian Processes , 2012, INTERSPEECH.

[31] W·M·贝尔特曼,et al. Speech audio process , 2011 .

[32] Claude Montacié,et al. Pitch and Intonation Contribution to Speakers' Traits Classification , 2012, INTERSPEECH.

[33] Alexei V. Ivanov,et al. Modulation Spectrum Analysis for Speaker Personality Trait Recognition , 2012, INTERSPEECH.

[34] John H. L. Hansen,et al. Nonlinear analysis and classification of speech under stressed conditions , 1994 .

[35] Björn W. Schuller,et al. "Would You Buy a Car from Me?" - On the Likability of Telephone Voices , 2011, INTERSPEECH.

[36] John H. L. Hansen,et al. Uncertainty propagation in front end factor analysis for noise robust speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37] Annemieke H Ackerstaff,et al. Pretreatment organ function in patients with advanced head and neck cancer: clinical outcome measures and patients' views , 2009, BMC ear, nose, and throat disorders.

[38] Pierre Dumouchel,et al. Anchor Models and WCCN Normalization For Speaker Trait Classification , 2012, INTERSPEECH.

[39] Angeliki Metallinou,et al. Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network , 2012, INTERSPEECH.

[40] Yun Lei,et al. Robust feature front-end for speaker identification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41] John H. L. Hansen,et al. A systematic strategy for robust automatic dialect identification , 2011, 2011 19th European Signal Processing Conference.

[42] Yun Lei,et al. Dialect identification: Impact of differences between read versus spontaneous speech , 2010, 2010 18th European Signal Processing Conference.

[43] Jun Li,et al. Crowd++: unsupervised speaker count with smartphones , 2013, UbiComp.

[44] Björn W. Schuller,et al. The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[45] Björn Schuller,et al. Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.