A multi-phase blending method with incremental intensity for training detection networks

Object detection is an important topic for visual data processing in the visual computing area. Although a number of approaches have been studied, it still remains a challenge. There is a suitable way to promote image classifiers by blending training with blended images and corresponding blended labels. However, our experiments show that directly moving existing blending methods from classification to object detection will cause the training process become harder and eventually will lead to a bad performance. Inspired by our discovery, this paper presents a multi-phase blending method with incremental blending intensity to improve the accuracy of object detectors and achieve remarkable improvements. Firstly, to adapt blending method to detection task, we propose a smoothly scheduled and incremental blending intensity to control the degree of multi-phase blending. Based on the above dynamic coefficient, we propose an incremental blending method, in which the blending intensity is smoothly increased from zero to full. Therefore, more complex and various data can be created to achieve the goal of regularizing the network. Secondly, we also design an incremental hybrid loss function to replace the original loss function. The blending intensity in our loss function increases smoothly, which is controlled by our scheduled coefficient. Thirdly, we further discard more negative examples in our multi-phase training process than other typical training methods and processes. By doing so, we can regularize the neural network to enhance generalization capability with data diversity and eventually to improve the accuracy in object detection. Another advantage is that there is no negative effect on evaluation because our method is just applied during the training process. Typical experiments show the proposed method improves the generalization of the detection networks. On PASCAL VOC and MS COCO, our method outperforms the state-of-the-art RFBNet of one-stage detectors for real-time processing.

[1]  Jason Weston,et al.  Vicinal Risk Minimization , 2000, NIPS.

[2]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  A. Ben Hamza,et al.  Deep similarity network fusion for 3D shape classification , 2019, The Visual Computer.

[6]  Kang Li,et al.  Robust Visual Tracking Based on Convolutional Features with Illumination and Occlusion Handing , 2018, Journal of Computer Science and Technology.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Jakob Andreas Bærentzen,et al.  Interactive directional subsurface scattering and transport of emergent light , 2017, The Visual Computer.

[9]  Fazhi He,et al.  An Efficient Particle Swarm Optimization for Large-Scale Hardware/Software Co-Design System , 2017, Int. J. Cooperative Inf. Syst..

[10]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[11]  Xiao Chen,et al.  A parallel and robust object tracking approach synthesizing adaptive Bayesian learning and improved incremental subspace learning , 2019, Frontiers of Computer Science.

[12]  Matthias Zwicker,et al.  Stylistic scene enhancement GAN: mixed stylistic enhancement generation for 3D indoor scenes , 2019, The Visual Computer.

[13]  Yiteng Pan,et al.  A novel region-based active contour model via local patch similarity measure for image segmentation , 2018, Multimedia Tools and Applications.

[14]  Sinan Kalkan,et al.  Localization Recall Precision (LRP): A New Performance Metric for Object Detection , 2018, ECCV.

[15]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[17]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[18]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[19]  Hannes Kaufmann,et al.  DeepLight: light source estimation for augmented reality using deep learning , 2019, The Visual Computer.

[20]  Choh-Man Teng,et al.  A Comparison of Noise Handling Techniques , 2001, FLAIRS.

[21]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Hongyu Guo,et al.  MixUp as Locally Linear Out-Of-Manifold Regularization , 2018, AAAI.

[24]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Fazhi He,et al.  A dividing-based many-objective evolutionary algorithm for large-scale feature selection , 2019, Soft Computing.

[26]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[27]  Alexei Sourin,et al.  Real-time haptic interaction with RGBD video streams , 2016, The Visual Computer.

[28]  Xiaosong Yang,et al.  Efficient convolutional hierarchical autoencoder for human motion prediction , 2019, The Visual Computer.

[29]  Fazhi He,et al.  An efficient and robust bat algorithm with fusion of opposition-based learning and whale optimization algorithm , 2020, Intell. Data Anal..

[30]  Fazhi He,et al.  DRCDN: learning deep residual convolutional dehazing networks , 2019, The Visual Computer.

[31]  Chen Li,et al.  Example-based rapid generation of vegetation on terrain via CNN-based distribution learning , 2019, The Visual Computer.

[32]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[34]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yi Zhou,et al.  An efficient GPU-based parallel tabu search algorithm for hardware/software co-design , 2020, Frontiers of Computer Science.

[37]  Jian Yao,et al.  Joint learning of image detail and transmission map for single image dehazing , 2018, The Visual Computer.

[38]  Yiteng Pan,et al.  A scalable region-based level set method using adaptive bilateral filter for noisy image segmentation , 2019, Multimedia Tools and Applications.

[39]  Sinan Kalkan,et al.  Deep 3D semantic scene extrapolation , 2018, The Visual Computer.

[40]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition - Tangent Distance and Tangent Propagation , 2012, Neural Networks: Tricks of the Trade.

[41]  Yiteng Pan,et al.  A novel segmentation model for medical images with intensity inhomogeneity based on adaptive perturbation , 2018, Multimedia Tools and Applications.

[42]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yilin Chen,et al.  A new haze removal approach for sky/river alike scenes based on external and internal clues , 2019, Multimedia Tools and Applications.

[44]  Tien-Tsin Wong,et al.  Deep binocular tone mapping , 2019, The Visual Computer.

[45]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[46]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[47]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[48]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[49]  Yiteng Pan,et al.  A correlative denoising autoencoder to model social influence for top-N recommender system , 2019, Frontiers of Computer Science.

[50]  Xiao Chen,et al.  A matting method based on full feature coverage , 2018, Multimedia Tools and Applications.

[51]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[52]  Takashi Matsubara,et al.  RICAP: Random Image Cropping and Patching Data Augmentation for Deep CNNs , 2018, ACML.

[53]  Jun Sun,et al.  A multiple template approach for robust tracking of fast motion target , 2016, Applied Mathematics-A Journal of Chinese Universities.

[54]  Jia-shi Yong,et al.  A Novel Bat Algorithm based on Cross Boundary Learning and Uniform Explosion Strategy , 2019 .

[55]  Josef Kittler,et al.  Dynamic Texture Recognition Using Multiscale Binarized Statistical Image Features , 2014, IEEE Transactions on Multimedia.

[56]  Guangcan Liu,et al.  Deeper cascaded peak-piloted network for weak expression recognition , 2018, The Visual Computer.

[57]  Fazhi He,et al.  Service-Oriented Feature-Based Data Exchange for Cloud-Based Design and Manufacturing , 2018, IEEE Transactions on Services Computing.

[58]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[59]  Yunhong Wang,et al.  Receptive Field Block Net for Accurate and Fast Object Detection , 2017, ECCV.

[60]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[61]  Fazhi He,et al.  A correlative classifiers approach based on particle filter and sample set for tracking occluded target , 2017 .

[62]  Ning Wang,et al.  A survey on deep neural network-based image captioning , 2018, The Visual Computer.

[63]  Selim Balcisoy,et al.  Evaluation of X-ray visualization techniques for vertical depth judgments in underground exploration , 2018, The Visual Computer.

[64]  Yi Zhou,et al.  Dynamic strategy based parallel ant colony optimization on GPUs for TSPs , 2017, Science China Information Sciences.

[65]  Fazhi He,et al.  IBEA-SVM: An Indicator-based Evolutionary Algorithm Based on Pre-selection with Classification Guided by SVM , 2019, Applied Mathematics-A Journal of Chinese Universities.