Supra-Segmental Feature Based Speaker Trait Detection

It is well known that speech utterances convey a rich diversity of information concerning the speaker in addition to related semantic content. Such information may contain speaker traits such as personality, likability, health/pathology, etc. To detect speaker traits in human computer interface is an important task toward formulating more efficient and natural computer engagement. This study proposes two groups of supra-segmental features for improving speaker trait detection performance. Compared with the 6125 dimension features based baseline system, the proposed supra-segmental system not only improves performance by 9.0%, but also is computationally attractive and proper for real life application since it derives a less than 63 dimension features, which are 99% less than the baseline system.

[1]  Hugo Van hamme,et al.  Age Estimation from Telephone Speech using i-vectors , 2012, INTERSPEECH.

[2]  Gang Liu,et al.  Robust speech enhancement techniques for ASR in non-stationary noise and dynamic environments , 2013, INTERSPEECH.

[3]  John H. L. Hansen,et al.  A fast speaker verification with universal background support data selection , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Naveen Kumar,et al.  Intelligibility classification of pathological speech using fusion of multiple high level descriptors , 2012, INTERSPEECH.

[5]  John H. L. Hansen,et al.  Exploring Hilbert envelope based acoustic features in i-vector speaker verification using HT-PLDA , 2011 .

[6]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[8]  John H. L. Hansen,et al.  Classification of speech under stress using target driven features , 1996, Speech Commun..

[9]  John H. L. Hansen,et al.  CRSS systems for 2012 NIST Speaker Recognition Evaluation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Shrikanth Narayanan,et al.  Feature analysis for automatic detection of pathological speech , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[11]  John H. L. Hansen,et al.  A linguistic data acquisition front-end for language recognition evaluation , 2012, Odyssey.

[12]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[13]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Felix Burkhardt,et al.  A Database of Age and Gender Annotated Telephone Speech , 2010, LREC.

[15]  John H. L. Hansen,et al.  Automatic regularization of cross-entropy cost for speaker recognition fusion , 2013, INTERSPEECH.

[16]  Yun Lei,et al.  A novel feature extraction strategy for multi-stream robust emotion identification , 2010, INTERSPEECH.

[17]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[18]  John H. L. Hansen,et al.  An investigation on back-end for speaker recognition in multi-session enrollment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Eliathamby Ambikairajah,et al.  Language Identification using Warping and the Shifted Delta Cepstrum , 2005, 2005 IEEE 7th Workshop on Multimedia Signal Processing.

[20]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[21]  Richard M. Stern,et al.  Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Julien Epps,et al.  A Comparison of Classification Paradigms for Features for Speaker Likeability Determination , 2012 .

[23]  John H. L. Hansen,et al.  VOICE ANALYSIS IN ADVERSE CONDITIONS: THE CENTENNIAL OLYMPIC PARK BOMBING 911 CALL , 1999 .

[24]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  John H. L. Hansen,et al.  I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.

[26]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[27]  Dimitra Vergyri,et al.  Multi-System Fusion of Extended Context Prosodic and Cepstral Features for Paralinguistic Speaker Trait Classification , 2012, INTERSPEECH.

[28]  Rongshan Yu,et al.  Detecting Intelligibility by Linear Dimensionality Reduction and Normalized Voice Quality Hierarchical Features , 2012, INTERSPEECH.

[29]  John H. L. Hansen,et al.  Detecting Sleepiness by Fusing Classifiers Trained with Novel Acoustic Features , 2011, INTERSPEECH.

[30]  Fei Sha,et al.  Predicting Likability of Speakers with Gaussian Processes , 2012, INTERSPEECH.

[31]  W·M·贝尔特曼,et al.  Speech audio process , 2011 .

[32]  Claude Montacié,et al.  Pitch and Intonation Contribution to Speakers' Traits Classification , 2012, INTERSPEECH.

[33]  Alexei V. Ivanov,et al.  Modulation Spectrum Analysis for Speaker Personality Trait Recognition , 2012, INTERSPEECH.

[34]  John H. L. Hansen,et al.  Nonlinear analysis and classification of speech under stressed conditions , 1994 .

[35]  Björn W. Schuller,et al.  "Would You Buy a Car from Me?" - On the Likability of Telephone Voices , 2011, INTERSPEECH.

[36]  John H. L. Hansen,et al.  Uncertainty propagation in front end factor analysis for noise robust speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Annemieke H Ackerstaff,et al.  Pretreatment organ function in patients with advanced head and neck cancer: clinical outcome measures and patients' views , 2009, BMC ear, nose, and throat disorders.

[38]  Pierre Dumouchel,et al.  Anchor Models and WCCN Normalization For Speaker Trait Classification , 2012, INTERSPEECH.

[39]  Angeliki Metallinou,et al.  Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network , 2012, INTERSPEECH.

[40]  Yun Lei,et al.  Robust feature front-end for speaker identification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  John H. L. Hansen,et al.  A systematic strategy for robust automatic dialect identification , 2011, 2011 19th European Signal Processing Conference.

[42]  Yun Lei,et al.  Dialect identification: Impact of differences between read versus spontaneous speech , 2010, 2010 18th European Signal Processing Conference.

[43]  Jun Li,et al.  Crowd++: unsupervised speaker count with smartphones , 2013, UbiComp.

[44]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[45]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.