An efficient approach for clustering face images

Investigations that require the exploitation of large volumes of face imagery are increasingly common in current forensic scenarios (e.g., Boston Marathon bombing), but effective solutions for triaging such imagery (i.e., low importance, moderate importance, and of critical interest) are not available in the literature. General issues for investigators in these scenarios are a lack of systems that can scale to volumes of images of the order of a few million, and a lack of established methods for clustering the face images into the unknown number of persons of interest contained in the collection. As such, we explore best practices for clustering large sets of face images (up to 1 million here) into large numbers of clusters (approximately 200 thousand) as a method of reducing the volume of data to be investigated by forensic analysts. Our analysis involves a performance comparison of several clustering algorithms in terms of the accuracy of grouping face images by identity, run-time, and efficiency in representing large datasets of face images in terms of compact and isolated clusters. For two different face datasets, a mugshot database (PCSO) and the well known unconstrained dataset, LFW, we find the rank-order clustering method to be effective in clustering accuracy, and relatively efficient in terms of run-time.

[1]  G. Krishna,et al.  Agglomerative clustering using the concept of mutual nearest neighbourhood , 1978, Pattern Recognit..

[2]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[3]  Anil K. Jain,et al.  Component-Based Representation in Automated Face Recognition , 2013, IEEE Transactions on Information Forensics and Security.

[4]  Ting Liu,et al.  Clustering Billions of Images with Large Scale Nearest Neighbor Search , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[5]  Fred Nicolls,et al.  Active shape models with SIFT descriptors and MARS , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[6]  Justin Zobel,et al.  Clustering near-duplicate images in large collections , 2007, MIR '07.

[7]  Anil K. Jain,et al.  Open source biometric recognition , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[8]  Jian Sun,et al.  A rank-order distance based clustering algorithm for face tagging , 2011, CVPR 2011.

[9]  Yuandong Tian,et al.  A Face Annotation Framework with Partial Clustering and Interactive Labeling , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jing Wang,et al.  Scalable k-NN graph construction for visual descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  René Vidal,et al.  Low rank subspace clustering (LRSC) , 2014, Pattern Recognit. Lett..

[12]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[13]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[14]  Jian Sun,et al.  Face recognition with learning-based descriptor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[16]  David J. Kriegman,et al.  Clustering appearances of objects under varying illumination conditions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[18]  Ramesh C. Jain,et al.  Automatic Person Annotation of Family Photo Album , 2006, CIVR.

[19]  Yuandong Tian,et al.  EasyAlbum: an interactive photo annotation system based on face clustering and re-ranking , 2007, CHI.