GPS: Group People Segmentation with Detailed Part Inference

Noticeable progress has been witnessed in general object de- tection, semantic segmentation and instance segmentation, while parsing a group of people is still a challenging task for human-centric visual understanding due to severe occlusion and various poses. In this paper, we present a new large-scale dataset named “GPS (Group People Segmentation)”to boost academical study and technology development. GPS contains 14000 elaborately annotated images with 20 fine-grained se- mantic category labels related to human, divided into two sub- datasets corresponding to indoor and outdoor scenes involv- ing various poses, occlusion and background. We further pro- pose a novel GPSNet for group people segmentation. GPSNet consists of a new “Adjusted RoI Align”module to adjust po- sition of detected person and align RoI features, such that the network does not need to fit various positions of each per- son. A fusion of global and local features is also employed to refine parsing results. Compared with baseline methods, GPSNet achieves the best performance on GPS Dataset.

[1]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[2]  Yu Cheng,et al.  Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing , 2018, ACM Multimedia.

[3]  Xiaodan Liang,et al.  Human Parsing with Contextualized Convolutional Neural Network. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[4]  Xiaogang Wang,et al.  Learning Mid-level Filters for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jian Dong,et al.  Deep Human Parsing with Active Template Regression , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ke Gong,et al.  Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Changsheng Xu,et al.  Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yunchao Wei,et al.  Multiple-Human Parsing in the Wild , 2017 .

[9]  Changsheng Xu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yue Gao,et al.  Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval , 2013, ACM Multimedia.

[11]  Tamara L. Berg,et al.  Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Alan L. Yuille,et al.  Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net , 2015, ECCV.

[14]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Luis E. Ortiz,et al.  Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Alan L. Yuille,et al.  Joint Object and Part Segmentation Using Deep Learned Potentials , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).