Specific Person Recognition Based on Local Segmentation and Fusion

With the development of video and image technology, it is of great practical value to recognize specific persons in television programs and photo albums. However, occlusion of body parts and changes in shooting position and distance are common in real scenes. In this work, we proposed a Specific Person Recognition based on Local Segmentation and Fusion method, called PR-LSF, which improved the reliability of person recognition in these environments. We represented the human body as an aggregate of multiple parts and apply local segmentation to train multiple convolutional neural network (CNN) classifiers. Each part classifier generated an identification decision confidence for each part. By training the SVM classifier, we weighted the decision confidence of all parts to make a comprehensive judgment. To verify the effectiveness of the proposed algorithm, we performed experiments with unoccluded and occluded test sets. The experimental results demonstrated that PR-LSF achieved higher recognition performance than algorithms using a single body part and were reliable even with partial occlusions, multiple scenes, and shooting changes.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Elisabeth Rakus-Andersson,et al.  Face Classification Based on Linguistic Description of Facial Features , 2014, ICAISC.

[3]  Shuicheng Yan,et al.  Discriminative local binary patterns for human detection in personal album , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Gang Wang,et al.  Seeing People in Social Context: Recognizing People and Social Relationships , 2010, ECCV.

[7]  Mubarak Shah,et al.  Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Andrew Zisserman,et al.  “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[10]  Cordelia Schmid,et al.  Finding Actors and Actions in Movies , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Rainer Stiefelhagen,et al.  “Knock! Knock! Who is it?” probabilistic person identification in TV-series , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Ben Taskar,et al.  Learning from Partial Labels , 2011, J. Mach. Learn. Res..

[13]  Yunyi Wang,et al.  Multiple facial instance for face recognition based on SIFT features , 2009, 2009 International Conference on Mechatronics and Automation.

[14]  Chengjun Liu,et al.  Gabor-based kernel PCA with fractional power polynomial models for face recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Rainer Stiefelhagen,et al.  Semi-supervised Learning with Constraints for Person Identification in Multimedia Data , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.