Automatic speech emotion recognition is a process of recognizing emotions in speech. This has wide applications in the area of phsycatrics help and in robotics'he human computer interaction the challenging area of research. Any effective HCI system has two sections Training and testing. The techniques used in the system are feature extraction and classification. This paper focuses on the brief introduction of the GFCC feature extraction, optimization algorithm and the back propagation neural network for the classification of the emotions in speech. Speaker recognition refers to recognize the person from their speech. The speech signal contains the message being spoken, the emotional state of the speaker and the information of the speaker. So the speech signal can be used for the recognition of the speaker and the emotional state of the speaker .Emotion recognition in speech extracts the speech features in each utterance. It is the process of automatically recognizing who is speaking and in which emotional state the words are spoken on the basis of features present in the speech signal. (1).Speaker recognition can be text independent and text dependent .The context dependent has more accuracy. Speaker recognition is helpful in the areas such as voice dialling, banking by telephone, telephone shopping, database access services, information services, and security control for confidential information areas voice mail and remote access to computers (2).The detection of emotions in speech is gaining attention in wide range of application like mostly used to develop wide range of application like application for call centre and learning, gaming software, security applications and machine translation. Influence of emotional state of human speech in speaker recognition is very high (3). The term "emotion" can refer to an extremely complex state associated with a wide variety of mental, physiological and physical events. Emotional speech database is valuable for this speaker recognition. In a generalized way, a speech emotion recognition system is an application of speech processing in which the patterns of derived speech features (MFCC, pitch) are mapped by the classifier (HMM) during the training and testing session using pattern recognition algorithms to detect the emotions from each of their corresponding patterns. The technique is synonymous to speaker recognition system but its different approach to detect emotions makes it intelligent and adds security to achieve better service in various applications (4). Different techniques are used for the feature extraction which is the first step in emotion recognition. These features are the input to the classifier for the classification of the emotions. If the features extracted are chosen carefully it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. II. STRUCTURE OF SPEECH RECOGNITION The speech emotion recognition needs to extract short acoustic and prosody"s feature parameters reflecting emotion, and distinguish through a variety of classifier means. The Speech Recognition System can be divided into the following categories as Signal Preprocessing, Feature Extraction, and Speech Classification. The block diagram for ER system is:
[1]
Siva Prasad Nandyala,et al.
Automatic Speech Emotion and Speaker Recognition Based on Hybrid GMM and FFBNN
,
2014
.
[2]
Abhang Priyanka,et al.
Emotion Recognition using Speech and EEG Signal - A Review
,
2011
.
[3]
Zheng Wang,et al.
Cough signal recognition with Gammatone Cepstral Coefficients
,
2013,
2013 IEEE China Summit and International Conference on Signal and Information Processing.
[4]
Jaakko Astola,et al.
A study of the effect of emotional state upon text-independent speaker identification
,
2011,
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5]
Ashu Bansal,et al.
Speaker Recognition Using MFCC Front End Analysis and VQ Modeling Technique for Hindi Words using MATLAB
,
2012
.
[6]
Krishna Mohan Kudiri,et al.
Relative amplitude based features for emotion detection from speech
,
2010,
2010 International Conference on Signal and Image Processing.
[7]
Amit Sharma,et al.
Speech Emotion Recognition
,
2015
.
[8]
Dipti D. Joshi,et al.
Speech Emotion Recognition: A Review
,
2013
.
[9]
Lukás Burget,et al.
Recent progress in prosodic speaker verification
,
2011,
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10]
Rohit Sinha,et al.
Speech based Emotion Recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers
,
2013,
2013 National Conference on Communications (NCC).
[11]
David A. van Leeuwen,et al.
Knowing the non-target speakers: The effect of the i-vector population for PLDA training in speaker recognition
,
2013,
2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[12]
Valeri Mladenov,et al.
Neural networks used for speech recognition
,
2010
.
[13]
DeLiang Wang,et al.
Analyzing noise robustness of MFCC and GFCC features in speaker identification
,
2013,
2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14]
Steven van de Par,et al.
Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling
,
2012,
IEEE Transactions on Audio, Speech, and Language Processing.
[15]
Björn W. Schuller,et al.
Exploring Nonnegative Matrix Factorization for Audio Classification: Application to Speaker Recognition
,
2012,
ITG Conference on Speech Communication.
[16]
N. Murali Krishna,et al.
Emotion Recognition using Dynamic Time Warping Technique for Isolated Words
,
2011
.
[17]
S. Lalitha,et al.
Speech emotion recognition
,
2014,
2014 International Conference on Advances in Electronics Computers and Communications.
[18]
Shin-ichi Sato,et al.
Speaker recognition analysis using running autocorrelation function parameters.
,
2013
.
[19]
Maja J. Mataric,et al.
A Framework for Automatic Human Emotion Classification Using Emotion Profiles
,
2011,
IEEE Transactions on Audio, Speech, and Language Processing.
[20]
Akshay S. Utane,et al.
Emotion Recognition through Speech Using Gaussian Mixture Model and Support Vector Machine
,
2013
.
[21]
Malcolm Slaney,et al.
An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank
,
1997
.