Mirrored non-maximum suppression for accurate object part localization

There has been significant progress in object part localization such as human pose estimation and facial landmark detection. In most of the previous methods, two phenomena are ignored. Firstly, they usually output a set of candidate pose hypotheses but the hypothesis with the highest score obtained by Non-Maximum Suppression (NMS) is not always the optimal result. Secondly, they can not get exactly bilaterally symmetric keypoints on the mirrored images even though the training data is always augmented with mirrored images. In fact, the intrinsic relationship between the original image and the mirrored one is helpful for object part localization. In this paper, we propose Mirrored Non-Maximum Suppression (Mirrored NMS) which can utilize mirrored detections to improve the accuracy of object part localization. Experimental results show that our method can improve the state-of-the-art accuracy by 1.3~3.0% in PCP for human pose estimation and can produce more accurate results than averaging multiple hypotheses for facial landmark detection.

[1]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Vittorio Ferrari,et al.  Appearance Sharing for Collective Human Pose Estimation , 2012, ACCV.

[3]  Yuandong Tian,et al.  Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[4]  Pietro Perona,et al.  Merging Pose Estimates Across Space and Time , 2013, BMVC.

[5]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Ioannis Patras,et al.  Mirror, mirror on the wall, tell me, is the error small? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[8]  Jian Sun,et al.  Face Alignment Via Component-Based Discriminative Search , 2008, ECCV.

[9]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Tieniu Tan,et al.  Semantic windows mining in sliding window based object detection , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[11]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[12]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[14]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[15]  Yi Li,et al.  Beyond Physical Connections: Tree Models in Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  David J. Kriegman,et al.  Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).