IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information

Most recent semi-supervised video object segmentation (VOS) methods rely on fine-tuning deep convolutional neural networks online using the given mask of the first frame or predicted masks of subsequent frames. However, the online fine-tuning process is usually time-consuming, limiting the practical use of such methods. We propose a directional deep embedding and appearance learning (DDEAL) method, which is free of the online fine-tuning process, for fast VOS. First, a global directional matching module, which can be efficiently implemented by parallel convolutional operations, is proposed to learn a semantic pixel-wise embedding as an internal guidance. Second, an effective directional appearance model based statistics is proposed to represent the target and background on a spherical embedding space for VOS. Equipped with the global directional matching module and the directional appearance model learning module, DDEAL learns static cues from the labeled first frame and dynamically updates cues of the subsequent frames for object segmentation. Our method exhibits state-of-the-art VOS performance without using online fine-tuning. Specifically, it achieves a & mean score of 74.8% on DAVIS 2017 dataset and an overall score of 71.3% on the large-scale YouTube-VOS dataset, while retaining a speed of 25 fps with a single NVIDIA TITAN Xp GPU. Furthermore, our faster version runs 31 fps with only a little accuracy loss.

Michael J. Watts | Haibo He | Giorgio Battistelli | Toshihisa Tanaka | Sergio Cruces | Qinglai Wei | Dianhui Wang | Eduardo Bayro-Corrochano | Saudi Arabia | Cristiano Cervellera | Badong Chen | Rhee Man Kil | Shengli Xie | Preben Kidmose | Sander Bohte | Stefano Squartini | Igor Škrjanc | Ahmad Taher Azar | Björn Schuller | Dacheng Tao | Maurizio Filippone | Robi Polikar | Manuel Roveri | Huajin Tang | Wenlian Lu | Chunhua Shen | Adel M. Alimi | Marco Wiering | King Fahd | Changyin Sun | Shuiwang Ji | Murad Abu-Khalaf | Bart Baesens | El-Sayed M. El-Alfy | Ana Madureira | Giorgio Gnecco | Danil V. Prokhorov | Charles W. Anderson | Peter Tino | Zhijun Li | Yongduan Song | Robert Legenstein | Stefan Wermter | Juwei Lu | Daoyi Dong | Derong Liu | Hongyi Li | Jiancheng Lv | Madhusudana Shashanka | Jonathan Wu | David Elizondo | Pantelis Bouboulis | Yun Raymond Fu | Jinling Liang | Massimo Panella | Qionghai Dai | Dong Xu | El-Sayed El-Alfy | Steven Damelin | Aluizio Fausto | Padua Braga | C. Anderson | M. Panella | G. Gnecco | M. Wiering | S. Cruces | Shuiwang Ji | Dong Xu | Derong Liu | R. Legenstein | Haibo He | Dianhui Wang | D. Prokhorov | Toshihisa Tanaka | Yongduan Song | M. Abu-Khalaf | A. Alimi | A. Fausto | A. Azar | B. Baesens | G. Battistelli | E. Bayro-Corrochano | S. Bohté | P. Bouboulis | Padua Braga | C. Cervellera | Badong Chen | Qionghai Dai | S. Damelin | D. Dong | K. Fahd | S. Arabia | D. Elizondo | M. Filippone | Y. Fu | P. Kidmose | R. Kil | Hongyi Li | Zhijun Li | Jinling Liang | Juwei Lu | Wenlian Lu | Jiancheng Lv | A. Madureira | R. Polikar | M. Roveri | Björn Schuller | Madhusudana Shashanka | Chunhua Shen | I. Škrjanc | S. Squartini | Changyin Sun | Huajin Tang | D. Tao | P. Tiňo | M. Watts | Q. Wei | S. Wermter | Jonathan Wu | Shengli Xie

[1]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[2]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  F.M. Ghannouchi,et al.  Dynamic behavioral modeling of 3G power amplifiers using real-valued time-delay neural networks , 2004, IEEE Transactions on Microwave Theory and Techniques.

[6]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[8]  Jaehyeong Kim,et al.  A Generalized Memory Polynomial Model for Digital Predistortion of RF Power Amplifiers , 2006, IEEE Transactions on Signal Processing.

[9]  F.M. Ghannouchi,et al.  Adaptive Digital Predistortion of Wireless Power Amplifiers/Transmitters Using Dynamic Real-Valued Focused Time-Delay Line Neural Networks , 2010, IEEE Transactions on Microwave Theory and Techniques.

[10]  F Mkadem,et al.  Physically Inspired Neural Network Model for RF Power Amplifier Behavioral Modeling and Digital Predistortion , 2011, IEEE Transactions on Microwave Theory and Techniques.

[11]  Fadhel M. Ghannouchi,et al.  A Mutual Distortion and Impairment Compensator for Wideband Direct-Conversion Transmitters Using Neural Networks , 2012, IEEE Transactions on Broadcasting.

[12]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  张萌,et al.  Augmented radial basis function neural network predistorter for linearisation of wideband power amplifiers , 2014 .

[14]  Yiming Yang,et al.  Von Mises-Fisher Clustering Models , 2014, ICML.

[15]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[16]  Ngo Van Linh,et al.  Effective and Interpretable Document Classification Using Distinctly Labeled Dirichlet Process Mixture Models of von Mises-Fisher Distributions , 2015, DASFAA.

[17]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[18]  Bumman Kim,et al.  The Doherty Power Amplifier: Review of Recent Solutions and Trends , 2015, IEEE Transactions on Microwave Theory and Techniques.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Sergio Cruces,et al.  Behavioral Modeling and Predistortion of Power Amplifiers Under Sparsity Hypothesis , 2015, IEEE Transactions on Microwave Theory and Techniques.

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sumei Sun,et al.  A Survey on Power-Amplifier-Centric Techniques for Spectrum- and Energy-Efficient Wireless Communications , 2015, IEEE Communications Surveys & Tutorials.

[23]  Youxi Tang,et al.  A General Digital Predistortion Architecture Using Constrained Feedback Bandwidth for Wideband Power Amplifiers , 2015, IEEE Transactions on Microwave Theory and Techniques.

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Marko Robnik-Sikonja Data Generators for Learning Systems Based on RBF Networks , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Yue Gao,et al.  Multi-View 3D Object Retrieval With Deep Embedding Network , 2016, IEEE Transactions on Image Processing.

[30]  Allen Katz,et al.  The Evolution of PA Linearization: From Classic Feedforward and Feedback Through Analog and Digital Predistortion , 2016, IEEE Microwave Magazine.

[31]  R. Braithwaite Digital Predistortion of an RF Power Amplifier Using a Reduced Volterra Series Model With a Memory Polynomial Estimator , 2017, IEEE Transactions on Microwave Theory and Techniques.

[32]  Zoya Popovic,et al.  Amping Up the PA for 5G: Efficient GaN Power Amplifiers with Dynamic Supplies , 2017, IEEE Microwave Magazine.

[33]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Nei Kato,et al.  State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems , 2017, IEEE Communications Surveys & Tutorials.

[35]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[36]  Peter V. Gehler,et al.  Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Matthieu Crussière,et al.  Quantifying the Memory Effects of Power Amplifiers: EVM Closed-Form Derivations of Multicarrier Signals , 2017, IEEE Wireless Communications Letters.

[38]  Liming Chen,et al.  von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification , 2017, ArXiv.

[39]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[40]  Ming-Hsuan Yang,et al.  SegFlow: Joint Learning for Video Object Segmentation and Optical Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Nei Kato,et al.  The Deep Learning Vision for Heterogeneous Network Traffic Control: Proposal, Challenges, and Future Perspective , 2017, IEEE Wireless Communications.

[42]  Chang-Su Kim,et al.  Online Video Object Segmentation via Convolutional Trident Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Bastian Leibe,et al.  Online Adaptation of Convolutional Neural Networks for Video Object Segmentation , 2017, BMVC.