Anytime Recognition of Objects and Scenes

Humans are capable of perceiving a scene at a glance, and obtain deeper understanding with additional time. Similarly, visual recognition deployments should be robust to varying computational budgets. Such situations require Anytime recognition ability, which is rarely considered in computer vision research. We present a method for learning dynamic policies to optimize Anytime performance in visual architectures. Our model sequentially orders feature computation and performs subsequent classification. Crucially, decisions are made at test time and depend on observed data and intermediate results. We show the applicability of this system to standard problems in scene and object recognition. On suitable datasets, we can incorporate a semantic back-off strategy that gives maximally specific predictions for a desired level of accuracy, this provides a new view on the time course of human visual perception.

[1]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[2]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[3]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[4]  Trevor Hastie,et al.  Imputing Missing Data for Gene Expression Arrays , 2001 .

[5]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  S. Thorpe,et al.  The Time Course of Visual Processing: From Early Perception to Decision-Making , 2001, Journal of Cognitive Neuroscience.

[7]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[8]  Daniel Keren,et al.  Painter identification using local features and naive Bayes , 2002, Object recognition supported by user interaction for service robots.

[9]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[10]  Dieter Fox,et al.  Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[11]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[12]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13]  Siwei Lyu,et al.  A digital technique for art authentication , 2004, Proc. Natl. Acad. Sci. USA.

[14]  Antonio Torralba,et al.  Contextual Models for Object Detection Using Boosted Random Fields , 2004, NIPS.

[15]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[16]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Jonathan Brandt,et al.  Robust object detection via soft cascade , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Xiaodong Fan Efficient multiclass object detection by a hierarchy of classifiers , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[23]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[24]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Lawrence Carin,et al.  Cost-sensitive feature acquisition and classification , 2007, Pattern Recognit..

[26]  Thomas Hofmann,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2007 .

[27]  P. Perona,et al.  What do we perceive in a glance of a real-world scene? , 2007, Journal of vision.

[28]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Nando de Freitas,et al.  Target-directed attention: Sequential decision-making for gaze planning , 2008, 2008 IEEE International Conference on Robotics and Automation.

[30]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  J. Hegdé Time course of visual perception: Coarse-to-fine processing and beyond , 2008, Progress in Neurobiology.

[32]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, CVPR.

[33]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[34]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[35]  Charless C. Fowlkes,et al.  Discriminative models for multi-class object layout , 2009, ICCV.

[36]  Nicholas J. Butko,et al.  Optimal scanning for faster object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[38]  Olivier R. Joubert,et al.  The Time-Course of Visual Categorizations: You Spot the Animal Faster than the Bird , 2009, PloS one.

[39]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[42]  Tsuhan Chen,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < , 2022 .

[43]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Serge J. Belongie,et al.  Computer Vision and Image Understanding , 2022, SSRN Electronic Journal.

[45]  Ashish Kapoor,et al.  Visual recognition and detection under bounded computational resources , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[47]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Andreas Krause,et al.  Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization , 2010, COLT 2010.

[49]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Andreas Krause,et al.  Near-Optimal Bayesian Active Learning with Noisy Observations , 2010, NIPS.

[51]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[52]  Ian McGraw,et al.  FastInf: An Efficient Approximate Inference Library , 2010, J. Mach. Learn. Res..

[53]  Lior Shamir,et al.  Impressionism, expressionism, surrealism: Automated recognition of painters and schools of art , 2010, TAP.

[54]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[55]  Larry S. Davis,et al.  Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[56]  Kristen Grauman,et al.  Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds , 2011, CVPR 2011.

[57]  Daphne Koller,et al.  Active Classification based on Value of Classifier , 2011, NIPS.

[58]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[59]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[60]  Jianxiong Xiao,et al.  What makes an image memorable? , 2011, CVPR 2011.

[61]  Trevor Darrell,et al.  Timely Object Recognition , 2012, NIPS.

[62]  Jason Eisner,et al.  Cost-sensitive Dynamic Feature Selection , 2012 .

[63]  Kilian Q. Weinberger,et al.  Classifier Cascade for Minimizing Feature Evaluation Cost , 2012, AISTATS.

[64]  Lorenzo Torresani,et al.  Meta-class features for large-scale object categorization on a budget , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Kilian Q. Weinberger,et al.  The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[66]  Venkatesh Saligrama,et al.  Multi-Stage Classier Design , 2012 .

[67]  J. Andrew Bagnell,et al.  SpeedBoost: Anytime Prediction with Uniform Near-Optimality , 2012, AISTATS.

[68]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[70]  Balázs Kégl,et al.  Fast classification using sparse decision DAGs , 2012, ICML.

[71]  Patrick Gallinari,et al.  Sequential approaches for learning datum-wise sparse representations , 2012, Machine Learning.

[72]  David Tolpin,et al.  Selecting Computations: Theory and Applications , 2012, UAI.

[73]  Yee Whye Teh,et al.  Searching for objects driven by context , 2012, NIPS.

[74]  Jonathan Krause,et al.  Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Alexei A. Efros,et al.  Dating Historical Color Images , 2012, ECCV.

[76]  Matt J. Kusner,et al.  Cost-Sensitive Tree of Classifiers , 2012, ICML.

[77]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[78]  Siddhartha S. Srinivasa,et al.  Efficient touch based localization through submodularity , 2012, 2013 IEEE International Conference on Robotics and Automation.

[79]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[80]  Florent Perronnin,et al.  Learning beautiful (and ugly) attributes , 2013, BMVC.

[81]  Venkatesh Saligrama,et al.  Multi-stage classifier design , 2012, Machine Learning.

[82]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[83]  Ben Taskar,et al.  Dynamic Structured Model Selection , 2013, 2013 IEEE International Conference on Computer Vision.

[84]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[85]  Andreas Krause,et al.  Active Detection via Adaptive Submodularity , 2014, ICML.

[86]  Thomas Mensink,et al.  The Rijksmuseum Challenge: Museum-Centered Visual Recognition , 2014, ICMR.

[87]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[88]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[89]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[90]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[91]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[92]  Matthieu Cord,et al.  Sequentially Generated Instance-Dependent Image Representations for Classification , 2014, ICLR.

[93]  Raffay Hamid,et al.  What makes an image popular? , 2014, WWW.

[94]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[95]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[96]  Song-Chun Zhu,et al.  Visual Persuasion: Inferring Communicative Intents of Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[97]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[98]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.