Learning rebalanced human parsing model from imbalanced datasets

Abstract Research on human parsing methods has attracted increasing attention in a wide range of applications. However, dataset imbalance is still a challenging problem in this task, which directly affects the performance of human parsing. There are different types of dataset imbalance problems. For example, the numbers of samples for various labels in a dataset may differ, the scales of objects identified by different labels may vary considerably, the differences between some heterogeneous label types may be much smaller than other cases, and in some extreme situations, images may be labeled incorrectly. In this paper, we propose a rebalanced model for imbalanced human parsing. Two innovative blocks are included in the model, i.e., a pre-bilateral awareness block and a combined-order statistics awareness block. The function of the former is to leverage the multiscale feature extractors to capture the changing scale information in an efficient way from the spatial space. Meanwhile, the function of the latter is to exploit the information of the feature distributions from the channel space. Furthermore, we propose an imbalance data-drop algorithm to simultaneously solve the mislabeling and small sample label weighting problems. Extensive experiments are conducted on three datasets, and the experimental results demonstrate that our method is able to solve the problem of data imbalance efficiently and obtain better human parsing performance.

[1]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[2]  Shuicheng Yan,et al.  Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation , 2018, ECCV.

[3]  Shuicheng Yan,et al.  Human Parsing with Contextualized Convolutional Neural Network , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jian Dong,et al.  Parsing Based on Parselets: A Unified Deformable Mixture Model for Human Parsing , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Hélio Pedrini,et al.  Improved Person Re-Identification Based on Saliency and Semantic Parsing with Deep Neural Network Models , 2018, Image Vis. Comput..

[6]  Changsheng Xu,et al.  Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Muhittin Gokmen,et al.  Human Semantic Parsing for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Ming Tang,et al.  Progressive Cognitive Human Parsing , 2018, AAAI.

[9]  Liang Lin,et al.  Clothing Co-parsing by Joint Image Segmentation and Labeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yao Sun,et al.  Cross-domain Human Parsing via Adversarial Feature and Label Adaptation , 2018, AAAI.

[11]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Yunchao Wei,et al.  Devil in the Details: Towards Accurate Single and Multiple Human Parsing , 2018, AAAI.

[13]  Shuicheng Yan,et al.  Robust and Efficient Subspace Segmentation via Least Squares Regression , 2012, ECCV.

[14]  Jianfei Cai,et al.  Keypoint Based Weakly Supervised Human Parsing , 2018, Image Vis. Comput..

[15]  Fang Zhao,et al.  Self-Supervised Neural Aggregation Networks for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Yu Cheng,et al.  Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing , 2018, ACM Multimedia.

[17]  Xiaochun Cao,et al.  Fashion Parsing With Video Context , 2015, IEEE Trans. Multim..

[18]  Shaogang Gong,et al.  Class Rectification Hard Mining for Imbalanced Deep Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Ming Yang,et al.  Instance-level Human Parsing via Part Grouping Network , 2018, ECCV.

[20]  Yi Yang,et al.  Macro-Micro Adversarial Network for Human Parsing , 2018, ECCV.

[21]  Jun Li,et al.  Improving face representation learning with center invariant loss , 2018, Image Vis. Comput..

[22]  Xiangjian He,et al.  Trusted Guidance Pyramid Network for Human Parsing , 2018, ACM Multimedia.

[23]  Deva Ramanan,et al.  Attentional Pooling for Action Recognition , 2017, NIPS.

[24]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[25]  Hanjiang Lai,et al.  Towards Multi-Pose Guided Virtual Try-On Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Liang Lin,et al.  Look into Person: Joint Body Parsing & Pose Estimation Network and a New Benchmark , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Bo Li,et al.  Data Dropout: Optimizing Training Data for Convolutional Neural Networks , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[28]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Luis E. Ortiz,et al.  Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yao Chou,et al.  Flow Adaptive Video Object Segmentation , 2020, Image Vis. Comput..

[31]  Francesc Moreno-Noguer,et al.  A High Performance CRF Model for Clothes Parsing , 2014, ACCV.

[32]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[33]  Adams Wai-Kin Kong,et al.  A Study on Wrist Identification for Forensic Investigation , 2019, Image Vis. Comput..

[34]  Gueesang Lee,et al.  A novel 2D and 3D multimodal approach for in-the-wild facial expression recognition , 2019, Image Vis. Comput..

[35]  Luis E. Ortiz,et al.  Retrieving Similar Styles to Parse Clothing , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.