Large-Scale Unsupervised Object Discovery

Existing approaches to unsupervised object discovery (UOD) do not scale up to large datasets without approximations which compromise their performance. We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of distributed methods available for eigenvalue problems and link analysis. Extensive experiments with COCO [42] and OpenImages [35] demonstrate that, in the single-object discovery setting where a single prominent object is sought in each image, the proposed LOD (Large-scale Object Discovery) approach is on par with, or better than the state of the art for medium-scale datasets (up to 120K images), and over 37% better than the only other algorithms capable of scaling up to 1.7M images. In the multi-object discovery setting where multiple objects are sought in each image, the proposed LOD is over 14% better in average precision (AP) than all other methods for datasets ranging from 20K to 1.7M images.1 Figure 1: Sample UOD results obtained by LOD on the OpenImages dataset [35] which contains 1.7M images. Ground-truth boxes are shown in yellow, and predictions are in red. Best viewed in color.

[1]  Yong Jae Lee,et al.  Shape discovery from unlabeled image collections , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  O. Perron,et al.  Grundlagen für eine Theorie des Jacobischen Kettenbruchalgorithmus , 1907 .

[3]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[4]  Fei-Fei Li,et al.  Efficient Image and Video Co-localization with Frank-Wolfe Algorithm , 2014, ECCV.

[5]  Stan Sclaroff,et al.  Deep Metric Learning to Rank , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ting Zhao,et al.  Pyramid Feature Attention Network for Saliency Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jiayu Tang,et al.  Non-negative matrix factorisation for object class discovery and image auto-annotation , 2008, CIVR '08.

[9]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[11]  David P. Dobkin,et al.  A search engine for 3D models , 2003, TOGS.

[12]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[13]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[15]  Chunxiao Liu,et al.  Person re-identification by manifold ranking , 2013, 2013 IEEE International Conference on Image Processing.

[16]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[17]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[19]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Patrick Pérez,et al.  Unsupervised Image Matching and Object Discovery as Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ingmar Posner,et al.  GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations , 2019, ICLR.

[24]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[25]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26]  Hui Zhang,et al.  Localized Content-Based Image Retrieval , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Feiping Nie,et al.  Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[29]  Zhuowen Tu,et al.  Unsupervised object class discovery via saliency-guided multiple class learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  David Dagan Feng,et al.  Robust saliency detection via regularized random walks ranking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Hyunjung Shim,et al.  PsyNet: Self-Supervised Approach to Object Localization Using Point Symmetric Transformation , 2020, AAAI.

[32]  Georg Heigold,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[33]  Yao Li,et al.  Deep Descriptor Transforming for Image Co-Localization , 2017, IJCAI.

[34]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[36]  Fei Wang,et al.  Maximum Margin Multiple Instance Clustering With Applications to Image and Text Clustering , 2011, IEEE Transactions on Neural Networks.

[37]  Yao Li,et al.  Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution , 2016, ECCV.

[38]  Michael W. Mahoney,et al.  Implementing regularization implicitly via approximate eigenvector computation , 2010, ICML.

[39]  Jean Ponce,et al.  Unsupervised Layered Image Decomposition into Object Prototypes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Ming Tang,et al.  Robust tracking via weakly supervised ranking SVM , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Zhi-Hua Zhou,et al.  Multi-instance clustering with applications to multi-instance prediction , 2009, Applied Intelligence.

[42]  Matthieu Cord,et al.  Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning , 2020, ArXiv.

[43]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[44]  Xiu-Shen Wei,et al.  Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming , 2017, ArXiv.

[45]  Trevor Darrell,et al.  Unsupervised Learning of Categories from Sets of Partially Matching Image Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[46]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Yi Yang,et al.  Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[48]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[49]  Antonio Torralba,et al.  Unsupervised Detection of Regions of Interest Using Iterative Link Analysis , 2009, NIPS.

[50]  Qi Zou,et al.  Mining Objects: Fully Unsupervised Object Discovery and Localization From a Single Image , 2019, ArXiv.

[51]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[52]  Renjie Hu,et al.  Saliency detection via PageRank and local spline regression , 2018, J. Electronic Imaging.

[53]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[56]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[57]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Vineeth N. Balasubramanian,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[59]  Gabriel Pinski,et al.  Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics , 1976, Inf. Process. Manag..

[60]  Quanshi Zhang,et al.  Mining And-Or Graphs for Graph Matching and Object Discovery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Shahram Payandeh,et al.  Application of Modified PageRank Algorithm for Anomaly Detection in Movements of Older Adults , 2019, International journal of telemedicine and applications.

[62]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[63]  Jian Sun,et al.  Salient object detection by composition , 2011, 2011 International Conference on Computer Vision.

[64]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[65]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Eric P. Xing,et al.  On multiple foreground cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Yunchao Wei,et al.  Inter-Image Communication for Weakly Supervised Localization , 2020, ECCV.

[68]  Jean Ponce,et al.  Toward unsupervised, multi-object discovery in large-scale image collections , 2020, ECCV.

[69]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[70]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[71]  Seong Joon Oh,et al.  Evaluating Weakly Supervised Object Localization Methods Right , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[73]  Fei-Fei Li,et al.  Co-localization in Real-World Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[75]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[76]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[78]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[79]  Christos Faloutsos,et al.  Unsupervised modeling of object categories using link analysis techniques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  R. Mises,et al.  Praktische Verfahren der Gleichungsauflösung . , 1929 .