论文信息 - Large-Scale Unsupervised Object Discovery

Large-Scale Unsupervised Object Discovery

Existing approaches to unsupervised object discovery (UOD) do not scale up to large datasets without approximations which compromise their performance. We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of distributed methods available for eigenvalue problems and link analysis. Extensive experiments with COCO [42] and OpenImages [35] demonstrate that, in the single-object discovery setting where a single prominent object is sought in each image, the proposed LOD (Large-scale Object Discovery) approach is on par with, or better than the state of the art for medium-scale datasets (up to 120K images), and over 37% better than the only other algorithms capable of scaling up to 1.7M images. In the multi-object discovery setting where multiple objects are sought in each image, the proposed LOD is over 14% better in average precision (AP) than all other methods for datasets ranging from 20K to 1.7M images.1 Figure 1: Sample UOD results obtained by LOD on the OpenImages dataset [35] which contains 1.7M images. Ground-truth boxes are shown in yellow, and predictions are in red. Best viewed in color.

[1] Yong Jae Lee,et al. Shape discovery from unlabeled image collections , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2] O. Perron,et al. Grundlagen für eine Theorie des Jacobischen Kettenbruchalgorithmus , 1907 .

[3] Christoph H. Lampert,et al. Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[4] Fei-Fei Li,et al. Efficient Image and Video Co-localization with Frank-Wolfe Algorithm , 2014, ECCV.

[5] Stan Sclaroff,et al. Deep Metric Learning to Rank , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Ting Zhao,et al. Pyramid Feature Attention Network for Saliency Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Jiayu Tang,et al. Non-negative matrix factorisation for object class discovery and image auto-annotation , 2008, CIVR '08.

[9] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[11] David P. Dobkin,et al. A search engine for 3D models , 2003, TOGS.

[12] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[13] Hyunjung Shim,et al. Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Pietro Perona,et al. Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[15] Chunxiao Liu,et al. Person re-identification by manifold ranking , 2013, 2013 IEEE International Conference on Image Processing.

[16] Francis R. Bach,et al. Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[17] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[19] Nikos Komodakis,et al. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20] Patrick Pérez,et al. Unsupervised Image Matching and Object Discovery as Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Shumeet Baluja,et al. VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Ingmar Posner,et al. GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations , 2019, ICLR.

[24] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[25] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26] Hui Zhang,et al. Localized Content-Based Image Retrieval , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Feiping Nie,et al. Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Leo Katz,et al. A new status index derived from sociometric analysis , 1953 .

[29] Zhuowen Tu,et al. Unsupervised object class discovery via saliency-guided multiple class learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30] David Dagan Feng,et al. Robust saliency detection via regularized random walks ranking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Hyunjung Shim,et al. PsyNet: Self-Supervised Approach to Object Localization Using Point Symmetric Transformation , 2020, AAAI.

[32] Georg Heigold,et al. Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[33] Yao Li,et al. Deep Descriptor Transforming for Image Co-Localization , 2017, IJCAI.

[34] Jean Ponce,et al. Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[36] Fei Wang,et al. Maximum Margin Multiple Instance Clustering With Applications to Image and Text Clustering , 2011, IEEE Transactions on Neural Networks.

[37] Yao Li,et al. Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution , 2016, ECCV.

[38] Michael W. Mahoney,et al. Implementing regularization implicitly via approximate eigenvector computation , 2010, ICML.

[39] Jean Ponce,et al. Unsupervised Layered Image Decomposition into Object Prototypes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[40] Ming Tang,et al. Robust tracking via weakly supervised ranking SVM , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41] Zhi-Hua Zhou,et al. Multi-instance clustering with applications to multi-instance prediction , 2009, Applied Intelligence.

[42] Matthieu Cord,et al. Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning , 2020, ArXiv.

[43] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[44] Xiu-Shen Wei,et al. Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming , 2017, ArXiv.

[45] Trevor Darrell,et al. Unsupervised Learning of Categories from Sets of Partially Matching Image Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[46] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Yi Yang,et al. Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[48] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[49] Antonio Torralba,et al. Unsupervised Detection of Regions of Interest Using Iterative Link Analysis , 2009, NIPS.

[50] Qi Zou,et al. Mining Objects: Fully Unsupervised Object Discovery and Localization From a Single Image , 2019, ArXiv.

[51] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[52] Renjie Hu,et al. Saliency detection via PageRank and local spline regression , 2018, J. Electronic Imaging.

[53] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Cordelia Schmid,et al. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Amy Nicole Langville,et al. Google's PageRank and beyond - the science of search engine rankings , 2006 .

[56] Carl D. Meyer,et al. Deeper Inside PageRank , 2004, Internet Math..

[57] Tao Mei,et al. Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Vineeth N. Balasubramanian,et al. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[59] Gabriel Pinski,et al. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics , 1976, Inf. Process. Manag..

[60] Quanshi Zhang,et al. Mining And-Or Graphs for Graph Matching and Object Discovery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61] Shahram Payandeh,et al. Application of Modified PageRank Algorithm for Anomaly Detection in Movements of Older Adults , 2019, International journal of telemedicine and applications.

[62] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[63] Jian Sun,et al. Salient object detection by composition , 2011, 2011 International Conference on Computer Vision.

[64] Klaus Greff,et al. Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[65] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66] Eric P. Xing,et al. On multiple foreground cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[67] Yunchao Wei,et al. Inter-Image Communication for Weakly Supervised Localization , 2020, ECCV.

[68] Jean Ponce,et al. Toward unsupervised, multi-object discovery in large-scale image collections , 2020, ECCV.

[69] Matthew Botvinick,et al. MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[70] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[71] Seong Joon Oh,et al. Evaluating Weakly Supervised Object Localization Methods Right , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[73] Fei-Fei Li,et al. Co-localization in Real-World Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[74] Alexei A. Efros,et al. Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[75] L. S. Shapley,et al. College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[76] Yi Yang,et al. Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77] Alexei A. Efros,et al. Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[78] Santiago Manen,et al. Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[79] Christos Faloutsos,et al. Unsupervised modeling of object categories using link analysis techniques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[80] R. Mises,et al. Praktische Verfahren der Gleichungsauflösung . , 1929 .