A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech

This research work proposes a new Genetic Algorithm (GA) based fusion scheme to effectively fuse the Time–Frequency (T–F) binary mask of voiced and unvoiced speech. The perceptual cues such as correlogram, cross-correlogram and pitch are commonly used to obtain the T–F binary mask of voiced speech. Recently, researchers use speech onset and offset to segment the unvoiced speech from the noisy speech mixture. Most of the research work which uses speech onset and offset to represent the unvoiced speech, combine the segments of unvoiced speech with the segments of voiced speech to obtain the T–F binary mask. This research work effectively fuses the T–F binary mask of voiced and unvoiced speech, instead of combining the segments of voiced and unvoiced speech using a Genetic Algorithm (GA). Moreover, a new method is proposed in this research work to obtain a T–F binary mask from the segments of unvoiced speech. The performance of the proposed GA based fusion scheme is evaluated using measures such as quality and intelligibility. The experimental results show that the proposed system enhances the speech quality by increasing the SNR with an average value of 10.74 dB and decreases the noise residue with an average value of 26.15% when compared with noisy speech mixture and enhances the speech intelligibility by increasing the CSII, NCM and STOI with an average value of 0.22, 0.20 and 0.17 as compared with the conventional speech segregation systems.

[1]  S. Shoba,et al.  Adaptive energy threshold for monaural speech separation , 2017, 2017 International Conference on Communication and Signal Processing (ICCSP).

[2]  Klaus Obermayer,et al.  Robust Detection of Environmental Sounds in Binaural Auditory Scenes , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[4]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[5]  Jean Rouat,et al.  A Quantitative Evaluation of a Bio-inspired Sound Segregation Technique for Two- and Three-Source Mixtures , 2004, Summer School on Neural Networks.

[6]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[7]  Ganesh R. Naik,et al.  Measure of Quality of Source Separation for Sub- and Super-Gaussian Audio Mixtures , 2012, Informatica.

[8]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[9]  R. Rajavel,et al.  Monaural speech separation system based on optimum soft mask , 2014, 2014 IEEE International Conference on Computational Intelligence and Computing Research.

[10]  DeLiang Wang,et al.  Auditory Segmentation Based on Onset and Offset Analysis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  E. Oja,et al.  Independent Component Analysis , 2013 .

[12]  Li Yi-na An Improved Monaural Speech Enhancement Algorithm Based on Sparse Dictionary Learning , 2014 .

[13]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Ganesh R. Naik,et al.  Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions , 2012 .

[15]  R Meddis,et al.  Simulation of auditory-neural transduction: further studies. , 1988, The Journal of the Acoustical Society of America.

[16]  Philipos C. Loizou,et al.  Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  DeLiang Wang,et al.  An Unsupervised Approach to Cochannel Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  DeLiang Wang,et al.  A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Longbiao Wang,et al.  Noise robust voice activity detection using joint phase and magnitude based feature enhancement , 2017, Journal of Ambient Intelligence and Humanized Computing.

[20]  DeLiang Wang,et al.  An Auditory Scene Analysis Approach to Monaural Speech Segregation , 2006 .

[21]  DeLiang Wang,et al.  Towards Generalizing Classification Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Chng Eng Siong,et al.  A hybrid neural network hidden Markov model approach for automatic story segmentation , 2017, J. Ambient Intell. Humaniz. Comput..

[23]  P. S. Sathidevi,et al.  A new GA optimised Reliability Ratio based integration weight estimation scheme for decision fusion Audio-Visual Speech Recognition , 2011 .

[24]  Nilesh Madhu,et al.  The Potential for Speech Intelligibility Improvement Using the Ideal Binary Mask and the Ideal Wiener Filter in Single Channel Noise Reduction Systems: Application to Auditory Prostheses , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Hafiz Adnan Habib,et al.  A hybrid technique for speech segregation and classification using a sophisticated deep neural network , 2018, PloS one.

[26]  DeLiang Wang,et al.  Unvoiced Speech Segregation From Nonspeech Interference via CASA and Spectral Subtraction , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[28]  Shoba Sivapatham,et al.  Performance improvement of monaural speech separation system using image analysis techniques , 2018, IET Signal Process..

[29]  Vishal Passricha,et al.  Multi-level region-of-interest CNNs for end to end speech recognition , 2019, J. Ambient Intell. Humaniz. Comput..

[30]  Daniel P. W. Ellis,et al.  Model-Based Monaural Source Separation Using a Vector-Quantized Phase-Vocoder Representation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[31]  DeLiang Wang,et al.  Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Guy J. Brown,et al.  Separation of Speech by Computational Auditory Scene Analysis , 2005 .

[33]  S. Shoba,et al.  Image Processing Techniques for Segments Grouping in Monaural Speech Separation , 2018 .

[34]  Chen Ning,et al.  Improved monaural speech segregation based on computational auditory scene analysis , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[35]  Yi Hu,et al.  A Comparative Intelligibility Study of Speech Enhancement Algorithms , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[36]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[37]  M. Dharmalingam and M. C. John Wiselin CASA For Improving Speech Intelligibility in Monaural Speech Separation , 2017 .

[38]  S. Shanthi Therese,et al.  A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system , 2017 .

[39]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[40]  DeLiang Wang,et al.  Deep Learning Based Binaural Speech Separation in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[42]  P. S. Sathidevi,et al.  Adaptive Reliability Measure and Optimum Integration Weight for Decision Fusion Audio-visual Speech Recognition , 2011, Journal of Signal Processing Systems.

[43]  S. Shoba,et al.  Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments , 2018 .