Convolutional Neural Network With Second-Order Pooling for Underwater Target Classification

Underwater target classification using passive sonar remains a critical issue due to the changeable ocean environment. Convolutional neural networks (CNNs) have shown success in learning invariant features using local filtering and max pooling. In this paper, we propose a novel classification framework which combines the CNN architecture with the second-order pooling (SOP) to capture the temporal correlations from the time-frequency (T-F) representation of the radiated acoustic signals. The convolutional layers are used to learn the local features with a set of kernel filters from the T-F inputs which are extracted by the constant-Q transform (CQT). Instead of using max pooling, the proposed SOP operator is designed to learn the co-occurrences of different CNN filters using the temporal feature trajectory of CNN features for each frequency subband. To preserve the frequency distinctions, the correlated features of each frequency subband are retained. The pooling results are normalized with signed square-root and $l_{2}$ normalization, and then input into the softmax classifier. The whole network can be trained in an end-to-end fashion. To explore the generalization ability to unseen conditions, the proposed CNN model is evaluated on the real radiated acoustic signals recorded at new sea depths. The experimental results demonstrate that the proposed method yields an 8% improvement in classification accuracy over the state-of-the-art deep learning methods.

[1]  Qiang Huang,et al.  Underwater target classification using wavelet packets and neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[2]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[3]  Thomas Lidy,et al.  CQT-based Convolutional Neural Networks for Audio Scene Classification , 2016, DCASE.

[4]  Mahmood R. Azimi-Sadjadi,et al.  Underwater target classification in changing environments using an adaptive feature mapping , 2002, IEEE Trans. Neural Networks.

[5]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  John H. L. Hansen,et al.  Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features , 2017, INTERSPEECH.

[7]  Gerald Penn,et al.  Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Li Deng,et al.  A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Dilip Sarkar,et al.  Semi-Automatic Extraction of Training Examples From Sensor Readings for Fall Detection and Posture Monitoring , 2016, IEEE Sensors Journal.

[10]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[11]  Yu Wang,et al.  A Deep Learning Approach for Blind Drift Calibration of Sensor Networks , 2017, IEEE Sensors Journal.

[12]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[13]  Masakiyo Fujimoto,et al.  Exploiting spectro-temporal locality in deep learning based acoustic event detection , 2015, EURASIP J. Audio Speech Music. Process..

[14]  Xiangyang Zeng,et al.  Robust underwater noise targets classification using auditory inspired time–frequency analysis , 2014 .

[15]  M.R. Azimi-Sadjadi,et al.  Undersea Target Classification Using Canonical Correlation Analysis , 2007, IEEE Journal of Oceanic Engineering.

[16]  M. H. Supriya,et al.  Deep learning architectures for underwater target recognition , 2013, 2013 Ocean Electronics (SYMPOL).

[17]  Duan-Yu Chen,et al.  Deep-Learning-Based Earth Fault Detection Using Continuous Wavelet Transform and Convolutional Neural Network in Resonant Grounding Distribution Systems , 2018, IEEE Sensors Journal.

[18]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[19]  Yang Yu,et al.  Deep learning-based recognition of underwater target , 2016, 2016 IEEE International Conference on Digital Signal Processing (DSP).

[20]  Fouzi Harrou,et al.  Obstacle Detection for Intelligent Transportation Systems Using Deep Stacked Autoencoder and $k$ -Nearest Neighbor Scheme , 2018, IEEE Sensors Journal.

[21]  M. H. Supriya,et al.  Novel class detection of underwater targets using Self-Organizing neural networks , 2015, 2015 IEEE Underwater Technology (UT).

[22]  Steve Renals,et al.  Convolutional Neural Networks for Distant Speech Recognition , 2014, IEEE Signal Processing Letters.

[23]  Anoop Cherian,et al.  Second-order Temporal Pooling for Action Recognition , 2017, International Journal of Computer Vision.

[24]  Yang Wang,et al.  Extraction and classification of acoustic scattering from underwater target based on Wigner-Ville distribution , 2018, Applied Acoustics.

[25]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Xuezhi Xiang,et al.  Real-Time Parking Occupancy Detection for Gas Stations Based on Haar-AdaBoosting and CNN , 2017, IEEE Sensors Journal.

[27]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[28]  Zhezhuang Xu,et al.  Learning Transportation Modes From Smartphone Sensors Based on Deep Neural Network , 2017, IEEE Sensors Journal.

[29]  M. Suchetha,et al.  A Novel 1-D Convolution Neural Network With SVM Architecture for Real-Time Detection Applications , 2018, IEEE Sensors Journal.

[30]  William Soares Filho,et al.  Preprocessing passive sonar signals for neural classification , 2011 .

[31]  Sang-Hoon Oh,et al.  Deep CNNs Along the Time Axis With Intermap Pooling for Robustness to Spectral Variations , 2016, IEEE Signal Processing Letters.

[32]  Anssi Klapuri,et al.  A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution , 2014, Semantic Audio.

[33]  L. Carin,et al.  Adaptive multiaspect target classification and detection with hidden Markov models , 2005, IEEE Sensors Journal.

[34]  Giovanni Costantini,et al.  Event based transcription system for polyphonic piano music , 2009, Signal Process..

[35]  W. J. Pielemeier,et al.  Time-frequency analysis of musical signals , 1996, Proc. IEEE.

[36]  Gregory H. Wakefield,et al.  A high‐resolution time–frequency representation for musical instrument signals , 1996 .

[37]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[38]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).