Robust Automatic Face Clustering in News Video

Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.

[1]  Matti Pietikäinen,et al.  Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[2]  W. Förstner,et al.  A Metric for Covariance Matrices , 2003 .

[3]  Umar Mohammed,et al.  Probabilistic Models for Inference about Identity , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[5]  Wen Gao,et al.  Local Gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Erik W. Grafarend,et al.  Geodesy-The Challenge of the 3rd Millennium , 2003 .

[7]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[8]  Langis Gagnon,et al.  Automatic Detection and Clustering of Actor Faces based on Spectral Clustering Techniques , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[9]  Sridha Sridharan,et al.  Speaker attribution of multiple telephone conversations using a complete-linkage clustering approach , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  James H. Elder,et al.  Bayesian Identity Clustering , 2010, 2010 Canadian Conference on Computer and Robot Vision.

[11]  Peter I. Corke,et al.  Local inter-session variability modelling for object classification , 2014, IEEE Winter Conference on Applications of Computer Vision.

[12]  Sébastien Marcel,et al.  Bi-modal biometric authentication on mobile phones in challenging conditions , 2014, Image Vis. Comput..

[13]  Patrick J. Flynn,et al.  Detecting questionable observers using face track clustering , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[14]  Sridha Sridharan,et al.  Speaker Attribution of Australian Broadcast News Data , 2013, SLAM@INTERSPEECH.

[15]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[16]  Mitchell McLaren,et al.  Total variability modelling for face verification , 2012, IET Biom..

[17]  Prithwijit Guha,et al.  The Video Face Book , 2012, MMM.

[18]  Philippe Joly,et al.  Face-and-clothing based people clustering in video content , 2010, MIR '10.

[19]  Sridha Sridharan,et al.  Extending the Task of Diarization to Speaker Attribution , 2011, INTERSPEECH.

[20]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[21]  Sébastien Marcel,et al.  Bob: a free signal processing and machine learning toolbox for researchers , 2012, ACM Multimedia.

[22]  Sridha Sridharan,et al.  Quality Based Frame Selection for Face Clustering in News Video , 2013, 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA).