Powerset Fusion Network for Target Classification in Unattended Ground Sensors

This paper presents an end-to-end multimodal fusion approach called powerset fusion network (PFN), based on a convolutional neural network (CNN), which is used for acoustic-seismic moving target classification in unattended ground sensor (UGS) systems. We first introduce the powerset fusion to fuse different modalities sufficiently. Then fusing representations at all abstraction levels is explored by using the proposed fully hybrid framework. We also examine how to use multi-task learning (MTL) to obtain better representations, overcome missing modalities and save embedded storage space. Finally, we present a novel multimodal cost function to accelerate the learning procedure. The experimental results based on the field dataset show that the performance of PFN is superior to that of the popular deep learning fusion systems.

[1]  Louis-Philippe Morency,et al.  Efficient Low-rank Multimodal Fusion With Modality-Specific Factors , 2018, ACL.

[2]  Gerard E. Sleefe,et al.  Acoustic and seismic modalities for unattended ground sensors , 1999, Defense, Security, and Sensing.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Shrikanth Narayanan,et al.  Collaborative classification applications in sensor networks , 2002, Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002.

[5]  Shuvra S. Bhattacharyya,et al.  An accumulative fusion architecture for discriminating people and vehicles using acoustic and seismic signals , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  Liguo Zhang,et al.  MCFNet: Multi-Layer Concatenation Fusion Network for Medical Images Fusion , 2019, IEEE Sensors Journal.

[8]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jiayu Zhou,et al.  Missing Modalities Imputation via Cascaded Residual Autoencoder , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Peter E. William,et al.  Classification of Military Ground Vehicles Using Time Domain Harmonics' Amplitudes , 2011, IEEE Transactions on Instrumentation and Measurement.

[11]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  M. Omair Ahmad,et al.  A Two-Stage Scheme for Fusion of Hash-Encoded Features in a Multimodal Biometric System , 2018, 2018 16th IEEE International New Circuits and Systems Conference (NEWCAS).

[13]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[14]  Hairong Qi,et al.  Target detection and classification using seismic signal processing in unattended ground sensor systems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Erik Cambria,et al.  Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling , 2018, Knowl. Based Syst..

[16]  Zhikui Chen,et al.  A Survey on Deep Learning for Multimodal Data Fusion , 2020, Neural Computation.

[17]  P. Zhou,et al.  Convolutional Neural Network-Based Moving Ground Target Classification Using Raw Seismic Waveforms as Input , 2019, IEEE Sensors Journal.

[18]  Feiping Nie,et al.  Dense Multimodal Fusion for Hierarchically Joint Representation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[20]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Haitao Liu,et al.  Improved DS acoustic-seismic modality fusion for ground-moving target classification in wireless sensor networks , 2007, Pattern Recognit. Lett..

[22]  Graham W. Taylor,et al.  Deep Multimodal Learning: A Survey on Recent Advances and Trends , 2017, IEEE Signal Processing Magazine.

[23]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[24]  P. Box,et al.  Multi-Target Classification Using Acoustic Signatures in Wireless Sensor Networks: A survey , 2010 .

[25]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[26]  Pascal Vasseur,et al.  Introduction to Multisensor Data Fusion , 2005, The Industrial Information Technology Handbook.