Continuous learning in computer vision

In this thesis we focus on continuous learning, and specifically on continuous learning in the context of computer vision. Computer vision aims at interpreting the world from its visual dimension, in an automatic manner. The world in general is characterized by continuity, and so is the visual world in particular. Therefore, this thesis delegates the task of coping with the continuous aspects of the computer vision problems approached, to the learning algorithms. Rather than discretizing the problems in our pursuit to find a solution, we focus on the continuity and ways in which we can learn it or integrate it in our proposed models. Each one of the chapters of this thesis aims at learning continuous aspects of the world such as motion, or relative entities such as “goodness” and “importance”, or concentrates on the problem of learning continuous functions, on the whole.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Mark Mackenzie,et al.  Asymmetric kernel regression , 2004, IEEE Transactions on Neural Networks.

[3]  Gabriela Csurka,et al.  Generic Visual Categorization Using Weak Geometry , 2006, Toward Category-Level Object Recognition.

[4]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[5]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Arnold W. M. Smeulders,et al.  Asymmetric kernel in Gaussian Processes for learning target variance , 2018, Pattern Recognit. Lett..

[7]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[8]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[9]  Nazli Ikizler-Cinbis,et al.  Learning actions from the Web , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[11]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[12]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[14]  Neil D. Lawrence,et al.  Overlapping Mixtures of Gaussian Processes for the Data Association Problem , 2011, Pattern Recognit..

[15]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[16]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[17]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[18]  Malte Kuss,et al.  Approximate inference for robust Gaussian process regression , 2005 .

[19]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[20]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Daniel E. Kalpokas Perceptual Experience and Seeing-as , 2015 .

[22]  Jürgen Schmidhuber,et al.  Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation , 2015, NIPS.

[23]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Andreas Buja,et al.  Stress functions for nonlinear dimension reduction, proximity analysis, and graph drawing , 2013, J. Mach. Learn. Res..

[25]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Neelima Chavali,et al.  Object-Proposal Evaluation Protocol is ‘Gameable’ , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Baolin Yin,et al.  Cracking BING and Beyond , 2014, BMVC.

[28]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[29]  Jan C. van Gemert,et al.  Exploiting photographic style for category-level image classification by generalizing the spatial pyramid , 2011, ICMR.

[30]  Simon Osindero,et al.  An Alternative Infinite Mixture Of Gaussian Process Experts , 2005, NIPS.

[31]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[32]  Nelson L. Max,et al.  Visualizing 3D velocity fields near contour surfaces , 1994, Proceedings Visualization '94.

[33]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Bernt Schiele,et al.  How good are detection proposals, really? , 2014, BMVC.

[36]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[37]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[38]  Species are dead. Long live genes , 1998 .

[39]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[40]  Hang Li,et al.  Asymmetric Kernel Learning , 2010 .

[41]  Matthew B. Blaschko,et al.  Learning a category independent object detection cascade , 2011, 2011 International Conference on Computer Vision.

[42]  Ying Wu,et al.  Motion from blur , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Lin Sun,et al.  Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Wenye Li Gaussian Process Learning: A Divide-and-Conquer Approach , 2014, ISNN.

[45]  David J. Fleet,et al.  Efficient Optimization for Sparse Gaussian Process Regression , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Stephen Tyree,et al.  Non-linear Metric Learning , 2012, NIPS.

[47]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[48]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[50]  Peter Kontschieder,et al.  Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[51]  Miguel Lázaro-Gredilla,et al.  Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression , 2013, NIPS.

[52]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[53]  Miguel Lázaro-Gredilla,et al.  Variational Heteroscedastic Gaussian Process Regression , 2011, ICML.

[54]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[55]  Yang Liu,et al.  Detect2Rank: Combining Object Detectors Using Learning to Rank , 2014, IEEE Transactions on Image Processing.

[56]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[57]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[58]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[59]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[61]  Brian Cabral,et al.  Imaging vector fields using line integral convolution , 1993, SIGGRAPH.

[62]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[63]  Jiri Matas,et al.  WaldBoost - learning for time constrained sequential detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[64]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[66]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[67]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[68]  David A. Forsyth,et al.  Rendering synthetic objects into legacy photographs , 2011, ACM Trans. Graph..

[69]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[70]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Edward H. Adelson,et al.  Motion without movement , 1991, SIGGRAPH.

[72]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[73]  Björn W. Schuller,et al.  Introducing CURRENNT: the munich open-source CUDA recurrent neural network toolkit , 2015, J. Mach. Learn. Res..

[74]  Jitendra Malik,et al.  Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[75]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[76]  Antonio Torralba,et al.  Accidental pinhole and pinspeck cameras: Revealing the scene outside the picture , 2012, CVPR.

[77]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[78]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[79]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[80]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Duy Nguyen-Tuong,et al.  Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.

[82]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Yi Yang,et al.  A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Shiliang Sun,et al.  Kernel regression with sparse metric learning , 2013, J. Intell. Fuzzy Syst..

[85]  Ramakant Nevatia,et al.  DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[87]  Yoshua Bengio,et al.  ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks , 2015, ArXiv.

[88]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[89]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[91]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[93]  Cor J. Veenman,et al.  Episode-Constrained Cross-Validation in Video Concept Retrieval , 2009, IEEE Transactions on Multimedia.

[94]  Christopher K. I. Williams,et al.  Discovering Hidden Features with Gaussian Processes Regression , 1998, NIPS.

[95]  Arnold W. M. Smeulders,et al.  Featureless: Bypassing feature extraction in action categorization , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[96]  Edwin V. Bonilla,et al.  Fast Allocation of Gaussian Process Experts , 2014, ICML.

[97]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[98]  G. Leibniz,et al.  New Essays on Human Understanding. , 1981 .

[99]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[100]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[101]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  Frédéric Jurie,et al.  Motion Models that Only Work Sometimes , 2012, BMVC.

[103]  Chao Yuan,et al.  Variational Mixture of Gaussian Process Experts , 2008, NIPS.

[104]  Volker Tresp,et al.  Mixtures of Gaussian Processes , 2000, NIPS.

[105]  Theo Gevers,et al.  Evaluation of Color STIPs for Human Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[106]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[107]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[108]  Pietro Perona,et al.  Online, Real-Time Tracking Using a Category-to-Individual Detector , 2014, ECCV.

[109]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[110]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[111]  Stefano Soatto,et al.  Boosting Convolutional Features for Robust Object Proposals , 2015, ArXiv.

[112]  Jiri Matas,et al.  Weighted Sampling for Large-Scale Boosting , 2008, BMVC.

[113]  K. Tsuda Support Vector Classi er with Asymmetric Kernel Functions , 1998 .

[114]  Kilian Q. Weinberger,et al.  Metric Learning for Kernel Regression , 2007, AISTATS.

[115]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[116]  Thierry Denoeux,et al.  Evidential combination of pedestrian detectors , 2014, BMVC.

[117]  Cordelia Schmid,et al.  Efficient Action Localization with Approximately Normalized Fisher Vectors , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[118]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[119]  Ming-Hsuan Yang,et al.  Online Sparse Gaussian Process Regression and Its Applications , 2011, IEEE Transactions on Image Processing.

[120]  Antonio Criminisi,et al.  Filter Forests for Learning Data-Dependent Convolutional Kernels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[121]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[122]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[123]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[124]  Vittorio Ferrari,et al.  Looking out of the window: object localization by joint analysis of all windows in the image , 2015, ArXiv.

[125]  Arnold W. M. Smeulders,et al.  Déjà Vu: - Motion Prediction in Static Images , 2018, ECCV.

[126]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[127]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[128]  Mubarak Shah,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[129]  Yu Qiao,et al.  Action Recognition with Stacked Fisher Vectors , 2014, ECCV.

[130]  Joachim Denzler,et al.  Large-Scale Gaussian Process Classification with Flexible Adaptive Histogram Kernels , 2012, ECCV.

[131]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[132]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[133]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[134]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[135]  Tinne Tuytelaars,et al.  Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[136]  Arnold W. M. Smeulders,et al.  Large scale Gaussian Process for overlap-based object proposal scoring , 2016, Comput. Vis. Image Underst..

[137]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[138]  Cristian Sminchisescu,et al.  Greedy Block Coordinate Descent for Large Scale Gaussian Process Regression , 2008, UAI.

[139]  Cordelia Schmid,et al.  Learning to Track for Spatio-Temporal Action Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[140]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[141]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[142]  Cees Snoek,et al.  What do 15,000 object categories tell us about classifying and localizing actions? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[143]  Kurt von Fritz The discovery of incommensurability by Hippasus of Metapontum , 2004 .

[144]  Antonio Torralba,et al.  HOGgles: Visualizing Object Detection Features , 2013, 2013 IEEE International Conference on Computer Vision.

[145]  Mark J. Schervish,et al.  Nonstationary Covariance Functions for Gaussian Process Regression , 2003, NIPS.

[146]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[147]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[148]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[149]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[150]  Sebastian Nowozin,et al.  Decision Jungles: Compact and Rich Models for Classification , 2013, NIPS.

[151]  R. Atchley A continuity theory of normal aging. , 1989, The Gerontologist.

[152]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).