A hybrid technique for speech segregation and classification using a sophisticated deep neural network

Recent research on speech segregation and music fingerprinting has led to improvements in speech segregation and music identification algorithms. Speech and music segregation generally involves the identification of music followed by speech segregation. However, music segregation becomes a challenging task in the presence of noise. This paper proposes a novel method of speech segregation for unlabelled stationary noisy audio signals using the deep belief network (DBN) model. The proposed method successfully segregates a music signal from noisy audio streams. A recurrent neural network (RNN)-based hidden layer segregation model is applied to remove stationary noise. Dictionary-based fisher algorithms are employed for speech classification. The proposed method is tested on three datasets (TIMIT, MIR-1K, and MusicBrainz), and the results indicate the robustness of proposed method for speech segregation. The qualitative and quantitative analysis carried out on three datasets demonstrate the efficiency of the proposed method compared to the state-of-the-art speech segregation and classification-based methods.

[1]  Hyoung-Gook Kim,et al.  A robust audio identification for enhancing audio-based indoor localization , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[2]  Diego Renza,et al.  A Semi-supervised Speaker Identification Method for Audio Forensics Using Cochleagrams , 2017, WEA.

[3]  Emilia Gómez,et al.  Monoaural Audio Source Separation Using Deep Convolutional Neural Networks , 2017, LVA/ICA.

[4]  Guang Yang,et al.  Efficient music identification by utilizing space-saving audio fingerprinting system , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[5]  Ben P. Milner,et al.  A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation , 2017, INTERSPEECH.

[6]  Cornel Ioana,et al.  Time-frequency signal reconstruction of nonsparse audio signals , 2017, 2017 22nd International Conference on Digital Signal Processing (DSP).

[7]  Claudia Feregrino Uribe,et al.  A robust audio fingerprinting method using spectrograms saliency maps , 2014, The 9th International Conference for Internet Technology and Secured Transactions (ICITST-2014).

[8]  Roneel V. Sharan,et al.  Robust acoustic event classification using deep neural networks , 2017, Inf. Sci..

[9]  Gérard Chollet,et al.  Detection of repeating items in audio streams using data-driven ALISP sequencing , 2014, 2014 1st International Conference on Advanced Technologies for Signal and Image Processing (ATSIP).

[10]  DeLiang Wang,et al.  Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Ben P. Milner,et al.  Using visual speech information and perceptually motivated loss functions for binary mask estimation , 2017, AVSP.

[12]  Lawrence Materum,et al.  Landmark-based audio fingerprinting algorithm for a transmitter-less alert recognition device for the hearing-impaired , 2015, 2015 International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM).

[13]  Jinyu Han,et al.  A Power Mask based audio fingerprint , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Zahid Mehmood,et al.  A novel image retrieval based on rectangular spatial histograms of visual words , 2018 .

[15]  Zahid Mehmood,et al.  A Novel Image Retrieval Based on a Combination of Local and Global Histograms of Visual Words , 2016 .

[16]  Zahid Mehmood,et al.  Content-based image retrieval and semantic automatic image annotation based on the weighted average of triangular histograms using support vector machine , 2017, Applied Intelligence.

[17]  DeLiang Wang,et al.  An Unsupervised Approach to Cochannel Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Shuang Feng,et al.  A Music Identification System Based on Audio Fingerprint , 2016, 2016 4th Intl Conf on Applied Computing and Information Technology/3rd Intl Conf on Computational Science/Intelligence and Applied Informatics/1st Intl Conf on Big Data, Cloud Computing, Data Science & Engineering (ACIT-CSII-BCD).

[19]  Shrikanth S. Narayanan,et al.  Music fingerprint extraction for classical music cover song identification , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[20]  Jonathan Le Roux,et al.  Consistent anisotropic wiener filtering for audio source separation , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[21]  Youssef Zaz,et al.  Indoor localization system benchmark based on wireless local network technologies , 2014, 2014 International Conference on Multimedia Computing and Systems (ICMCS).

[22]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[23]  Patrick Pérez,et al.  Motion informed audio source separation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Jyh-Shing Roger Jang,et al.  Function and speed portability of audio fingerprint extraction across computing platforms , 2015, 2015 IEEE International Conference on Consumer Electronics - Taiwan.

[25]  Marc Leman,et al.  Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification , 2014, ISMIR.

[26]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[27]  David V. Anderson,et al.  Activity analysis of construction equipment using audio signals and support vector machines , 2017 .

[28]  P. S. Sathidevi,et al.  Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking , 2015 .

[29]  Jun Du,et al.  Unsupervised single-channel speech separation via deep neural network for different gender mixtures , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[30]  Zhijian Ou,et al.  Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  Xiangyu Liu,et al.  Acoustic Fingerprinting Revisited: Generate Stable Device ID Stealthily with Inaudible Sound , 2014, CCS.

[32]  Shogo Masaya Audio signal separation through complex tensor factorization: Utilizing modulation frequency and phase information , 2018, Signal Process..

[33]  Sanxing Cao,et al.  Audio fingerprint extraction based on time-frequency domain , 2016, 2016 2nd IEEE International Conference on Computer and Communications (ICCC).

[34]  Indrajit Chakrabarti,et al.  Improving the Performance of Deep Learning Based Speech Enhancement System Using Fuzzy Restricted Boltzmann Machine , 2017, PReMI.

[35]  Azam Bastanfard,et al.  Determining the best proportion of music genre to be played in a radio program , 2015, 2015 7th Conference on Information and Knowledge Technology (IKT).

[36]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .