Co-Interest Person Detection from Multiple Wearable Camera Videos

Wearable cameras, such as Google Glass and Go Pro, enable video data collection over larger areas and from different views. In this paper, we tackle a new problem of locating the co-interest person (CIP), i.e., the one who draws attention from most camera wearers, from temporally synchronized videos taken by multiple wearable cameras. Our basic idea is to exploit the motion patterns of people and use them to correlate the persons across different videos, instead of performing appearance-based matching as in traditional video co-segmentation/localization. This way, we can identify CIP even if a group of people with similar appearance are present in the view. More specifically, we detect a set of persons on each frame as the candidates of the CIP and then build a Conditional Random Field (CRF) model to select the one with consistent motion patterns in different videos and high spacial-temporal consistency in each video. We collect three sets of wearable-camera videos for testing the proposed algorithm. All the involved people have similar appearances in the collected videos and the experiments demonstrate the effectiveness of the proposed algorithm.

[1]  Zhuwen Li,et al.  Video Co-segmentation for Meaningful Action Extraction , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  James M. Rehg,et al.  Movement Pattern Histogram for Action Recognition and Retrieval , 2014, ECCV.

[3]  Yaser Sheikh,et al.  3D Social Saliency from Head-mounted Cameras , 2012, NIPS.

[4]  Hyun Soo Park,et al.  Social Scene Understanding from Social Cameras , 2014 .

[5]  Vladimir Kolmogorov,et al.  Object cosegmentation , 2011, CVPR 2011.

[6]  Stephen Lin,et al.  Object-Based Multiple Foreground Video Co-segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Michael Werman,et al.  Event retrieval using motion barcodes , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[9]  Shmuel Peleg,et al.  Head Motion Signatures from Egocentric Videos , 2014, ACCV.

[10]  Hongliang Li,et al.  Complexity awareness based feature adaptive co-segmentation , 2013, 2013 IEEE International Conference on Image Processing.

[11]  Michael Werman,et al.  Event Matching from Significantly Different Views using Motion Barcodes , 2014, ArXiv.

[12]  Song-Chun Zhu,et al.  Cosegmentation and Cosketch by Unsupervised Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Jean-Marc Odobez,et al.  Tracking the Visual Focus of Attention for a Varying Number of Wandering People , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Matthieu Guillaumin,et al.  Fast Energy Minimization Using Learned State Filters , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Mubarak Shah,et al.  Video Object Co-segmentation by Regulated Maximum Weight Cliques , 2014, ECCV.

[17]  Long-Wen Chang,et al.  Video object cosegmentation , 2012, ACM Multimedia.

[18]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[19]  Thomas Deselaers,et al.  Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[20]  Fei-Fei Li,et al.  Efficient Image and Video Co-localization with Frank-Wolfe Algorithm , 2014, ECCV.

[21]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Nanning Zheng,et al.  Video Object Discovery and Co-Segmentation with Extremely Weak Supervision , 2017, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Fei-Fei Li,et al.  Co-localization in Real-World Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Joan Serrat,et al.  Video Co-segmentation , 2012, ACCV.

[26]  Mario Fritz,et al.  Multi-class Video Co-segmentation with a Generative Multi-video Model , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Takeo Kanade,et al.  Distributed cosegmentation via submodular optimization on anisotropic diffusion , 2011, 2011 International Conference on Computer Vision.

[28]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[29]  Ian D. Reid,et al.  Estimating Gaze Direction from Low-Resolution Faces in Video , 2006, ECCV.

[30]  Youjie Zhou,et al.  Video-Based Action Detection Using Multiple Wearable Cameras , 2014, ECCV Workshops.

[31]  Nanning Zheng,et al.  Joint Segmentation and Recognition of Categorized Objects From Noisy Web Image Collection , 2014, IEEE Transactions on Image Processing.

[32]  Hongliang Li,et al.  Unsupervised Multiclass Region Cosegmentation via Ensemble Clustering and Energy Minimization , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Shmuel Peleg,et al.  Wisdom of the Crowd in Egocentric Video Curation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[35]  King Ngi Ngan,et al.  Feature Adaptive Co-Segmentation by Complexity Awareness , 2013, IEEE Transactions on Image Processing.

[36]  Binlong Li,et al.  Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.