Closed-Loop Learning of Visual Control Policies

In this paper we present a general, flexible framework for learning mappings from images to actions by interacting with the environment. The basic idea is to introduce a feature-based image classifier in front of a reinforcement learning algorithm. The classifier partitions the visual space according to the presence or absence of few highly informative local descriptors that are incrementally selected in a sequence of attempts to remove perceptual aliasing. We also address the problem of fighting overfitting in such a greedy algorithm. Finally, we show how high-level visual features can be generated when the power of local descriptors is insufficient for completely disambiguating the aliased states. This is done by building a hierarchy of composite features that consist of recursive spatial combinations of visual features. We demonstrate the efficacy of our algorithms by solving three visual navigation tasks and a visual version of the classical "Car on the Hill" control problem.

[1]  Claude E. Shannon,et al.  The synthesis of two-terminal switching circuits , 1949, Bell Syst. Tech. J..

[2]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[3]  R. Bellman A Markovian Decision Process , 1957 .

[4]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[5]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[6]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[7]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[8]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[9]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[10]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[11]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[12]  Steven A. Lippman,et al.  Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[13]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[14]  R. Serfozo An Equivalence between Continuous and Discrete Time Markov Decision Processes. , 1976 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[17]  Paul Beaudet,et al.  Rotationally invariant image operators , 1978 .

[18]  Hans P. Moravec Obstacle avoidance and navigation in the real world by a seeing robot rover , 1980 .

[19]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[20]  Jean Serra,et al.  Image Analysis and Mathematical Morphology , 1983 .

[21]  E. Gibson,et al.  The development of perception , 1983 .

[22]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  Peter Allen Surface descriptions from vision and touch , 1984, ICRA.

[24]  W. Eric L. Grimson,et al.  Model-based recognition and localization from tactile data , 1984, ICRA.

[25]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[26]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[27]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[28]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[30]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[31]  R A Young,et al.  The Gaussian derivative model for spatial vision: I. Retinal mechanisms. , 1988, Spatial vision.

[32]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[33]  R. Bajcsy Active perception , 1988, Proc. IEEE.

[34]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[35]  C. Watkins Learning from delayed rewards , 1989 .

[36]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[37]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[38]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[39]  Yiannis Aloimonos,et al.  Purposive and qualitative active vision , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[40]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[41]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[42]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[43]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[44]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[45]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[46]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[47]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[49]  Richard Kergen,et al.  Computerized Control of the Blankholder Pressure on Deep Drawing Presses , 1992 .

[50]  Dana H. Ballard,et al.  Principles of animate vision , 1992, CVGIP Image Underst..

[51]  Randal E. Bryant,et al.  Symbolic Boolean manipulation with ordered binary-decision diagrams , 1992, CSUR.

[52]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[53]  D. O'Leary Development of connectional diversity and specificity in the mammalian brain by the pruning of collateral projections , 1992, Current Opinion in Neurobiology.

[54]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[55]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[56]  Max A. Viergever,et al.  General Intensity Transformations and Second Order Invariants , 1992 .

[57]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[58]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[59]  Marcos Salganicoff,et al.  Density-Adaptive Learning and Forgetting , 1993, ICML.

[60]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[61]  Andrew G. Barto,et al.  Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.

[62]  Enrico Macii,et al.  Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[63]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[64]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[65]  Gérard G. Medioni,et al.  Finding Waldo, or focus of attention using local color information , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[66]  T. Sejnowski,et al.  A critique of pure vision , 1993 .

[67]  Hiroshi Murase,et al.  Learning, positioning, and tracking visual appearance , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[68]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[69]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[70]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[71]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[72]  Minoru Asada,et al.  Vision-Based Behavior Acquisition For A Shooting Robot By Using A Reinforcement Learning , 1994 .

[73]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[74]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[75]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[76]  John K. Tsotsos There is no one way to look at vision , 1994 .

[77]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[78]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[79]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[80]  Kenji Doya,et al.  Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.

[81]  Shih-Fu Chang,et al.  Single color extraction and image query , 1995, Proceedings., International Conference on Image Processing.

[82]  Yiannis Aloimonos,et al.  Vision and action , 1995, Image Vis. Comput..

[83]  Carl-Johan Westelius,et al.  Focus of attention and gaze control for robot vision , 1995 .

[84]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[85]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .

[86]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[87]  IMAG-LIFIA,et al.  Experimental Comparison of Correlation Techniques , 1995 .

[88]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[89]  Pawea Cichosz Truncating Temporal Diierences: on the Eecient Implementation of Td for Reinforcement Learning , 1995 .

[90]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[91]  W. D. Ray,et al.  Stochastic Models: An Algorithmic Approach , 1995 .

[92]  Pietro Perona,et al.  Recognition of planar object classes , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[93]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[94]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[95]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[96]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[97]  Bernt Schiele,et al.  Object Recognition Using Multidimensional Receptive Field Histograms , 1996, ECCV.

[98]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[99]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[100]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[101]  Bir Bhanu,et al.  Closed-loop object recognition using reinforcement learning , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[102]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[103]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..

[104]  Stanley J. Rosenschein,et al.  Learning to act using real-time dynamic programming , 1996 .

[105]  David H. Eberly,et al.  Ridges in Image and Data Analysis , 1996, Computational Imaging and Vision.

[106]  Yali Amit,et al.  Graphical Templates for Model Registration , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[107]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[108]  Juyang Weng,et al.  Incremental learning for vision-based navigation , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[109]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[110]  Charles W. Anderson,et al.  Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[111]  Katsushi Ikeuchi,et al.  Detectability, Uniqueness, and Reliability of Eigen Windows for Stable Verification of Partially Occluded Objects , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[112]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[113]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[114]  Bernard Boigelot,et al.  An Improved Reachability Analysis Method for Strongly Linear Hybrid Systems (Extended Abstract) , 1997, CAV.

[115]  P. Schyns,et al.  Categorization creates functional features , 1997 .

[116]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[117]  J. A. Coelho,et al.  A control basis for learning multifingered grasps , 1997, J. Field Robotics.

[118]  Andrew J. Davison,et al.  Mobile Robot Navigation Using Active Vision , 1998 .

[119]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[120]  Roderic A. Grupen,et al.  A Control Structure For Learning Locomotion Gaits , 1998 .

[121]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[122]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[123]  Massimiliano Pontil,et al.  Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[124]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[125]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[126]  H.-M. Gross,et al.  A neural field approach to topological reinforcement learning in continuous action spaces , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[127]  David A. Forsyth,et al.  Finding objects by grouping primitives , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[128]  Bernard Boigelot Symbolic Methods for Exploring Infinite State Spaces , 1998 .

[129]  Jonathan Baxter KnightCap : A chess program that learns by combining TD ( ) with game-tree search , 1998 .

[130]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[131]  Rachid Deriche,et al.  Differential invariants for color images , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[132]  Pierre Wolper,et al.  On the Expressiveness of Real and Integer Arithmetic Automata (Extended Abstract) , 1998, ICALP.

[133]  Hayit Greenspan,et al.  Color- and Texture-based Image Segmentation Using the Expectation-Maximization Algorithm and its Application to Content-Based Image Retrieval. , 1998, ICCV 1998.

[134]  James L. Crowley,et al.  Visual Recognition Using Local Appearance , 1998, ECCV.

[135]  Larry D. Pyeatt,et al.  Decision Tree Function Approximation in Reinforcement Learning , 1999 .

[136]  David Wettergreen,et al.  Autonomous Guidance and Control for an Underwater Robotic Vehicle , 1999 .

[137]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[138]  Andrew W. Moore,et al.  Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[139]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[140]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[141]  Minoru Asada,et al.  Continuous valued Q-learning for vision-guided behavior acquisition , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[142]  Joachim M. Buhmann,et al.  Empirical evaluation of dissimilarity measures for color and texture , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[143]  Trygve Randen,et al.  Filtering for Texture Classification: A Comparative Study , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[144]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[145]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[146]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[147]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[148]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[149]  José Santos-Victor,et al.  Omni-directional Visual Navigation , 1999 .

[150]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[151]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[152]  Junichiro Yoshimoto,et al.  Application of reinforcement learning to balancing of Acrobot , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[153]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[154]  G. Delzanno,et al.  Symbolic Representation of Upward-Closed Sets , 2000, TACAS.

[155]  Andrew W. Moore,et al.  A Nonparametric Approach to Noisy and Costly Optimization , 2000, ICML.

[156]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[157]  Geoffrey J. Gordon Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[158]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[159]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[160]  Narendra Ahuja,et al.  Learning to recognize objects , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[161]  Lucas Paletta,et al.  Active object recognition by view integration and reinforcement learning , 2000, Robotics Auton. Syst..

[162]  Luke Fletcher,et al.  Reinforcement learning for visual servoing of a mobile robot , 2000 .

[163]  Nozha Boujemaa,et al.  Object-based queries using color points of interest , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[164]  Shigenobu Kobayashi,et al.  Reinforcement learning of walking behavior for a four-legged robot , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[165]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[166]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[167]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[168]  Juan Carlos Pérez-Cortes,et al.  Local Representations and a direct Voting Scheme for Face Recognition , 2001, PRIS.

[169]  M. E. McCarty,et al.  How infants use vision for grasping objects. , 2001, Child development.

[170]  Justus H. Piater,et al.  Developing haptic and visual perceptual categories for reaching and grasping with a humanoid robot , 2001, Robotics Auton. Syst..

[171]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[172]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[173]  V. Kvasnicka,et al.  Neural and Adaptive Systems: Fundamentals Through Simulations , 2001, IEEE Trans. Neural Networks.

[174]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[175]  Pierre Wolper,et al.  On the Use of Weak Automata for Deciding Linear Arithmetic with Integer and Real Variables , 2001, IJCAR.

[176]  Stepán Obdrzálek,et al.  Local Affine Frames for Image Retrieval , 2002, CIVR.

[177]  Katsunari Shibata,et al.  Application of direct-vision-based reinforcement learning to a real mobile robot , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[178]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[179]  Justus Piater,et al.  Learning Appearance Features to Support Robotic Manipulation , 2002 .

[180]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[181]  Valérie Gouet,et al.  About optimal use of color points of interest for content-based image retrieval , 2002 .

[182]  Heiko Wersing,et al.  Unsupervised Learning of Combination Features for Hierarchical Recognition Models , 2002, ICANN.

[183]  Ralf Schoknecht,et al.  Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.

[184]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[185]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[186]  Leslie Pack Kaelbling,et al.  Reinforcement Learning by Policy Search , 2002 .

[187]  Sébastien Jodogne,et al.  Automata-based Representations for the Verification of Hybrid Systems , 2002 .

[188]  Chris Gaskett,et al.  Q-Learning for Robot Control , 2002 .

[189]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[190]  Peng-Yeng Yin,et al.  Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization , 2002, Signal Process..

[191]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[192]  Geoffrey E. Hinton,et al.  Reinforcement learning for factored Markov decision processes , 2002 .

[193]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[194]  William T. Freeman,et al.  Nonparametric belief propagation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[195]  Luc Van Gool,et al.  HPAT Indexing for Fast Object/Scene Recognition Based on Local Appearance , 2003, CIVR.

[196]  Doina Precup,et al.  Using MDP Characteristics to Guide Exploration in Reinforcement Learning , 2003, ECML.

[197]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[198]  Francisco Brasileiro,et al.  Grid Computing for Bag of Tasks Applications , 2003 .

[199]  Bernt Schiele,et al.  Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[200]  M. Tarr,et al.  Learning to see faces and objects , 2003, Trends in Cognitive Sciences.

[201]  Florentin Wörgötter,et al.  Isotropic Sequence Order Learning , 2003, Neural Computation.

[202]  D. Koller,et al.  Planning under uncertainty in complex structured environments , 2003 .

[203]  Rosaleen J. Anderson Near optimal closed-loop control Application to electric power systems , 2003 .

[204]  Rémi Munos,et al.  Error Bounds for Approximate Policy Iteration , 2003, ICML.

[205]  Katsunari Shibata,et al.  Acquisition of box pushing by direct-vision-based reinforcement learning , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[206]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[207]  Andrew Zisserman,et al.  Texture classification: are filter banks necessary? , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[208]  Dimitri P. Bertsekas,et al.  Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[209]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[210]  Samuel W. Hasinoff,et al.  Reinforcement Learning for Problems with Hidden State , 2003 .

[211]  Pierre Wolper,et al.  An Effective Decision Procedure for Linear Arithmetic with Integer and Real Variables , 2003, ArXiv.

[212]  Rémi Coulom,et al.  A Model-Based Actor-Critic Algorithm in Continuous Time and Space , 2003 .

[213]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[214]  Ilan Shimshoni,et al.  Mean shift based clustering in high dimensions: a texture classification example , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[215]  Pierre Geurts,et al.  Iteratively Extending Time Horizon Reinforcement Learning , 2003, ECML.

[216]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[217]  Sébastien Jodogne,et al.  Hybrid Acceleration Using Real Vector Automata (Extended Abstract) , 2003, CAV.

[218]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[219]  Andrew Zisserman,et al.  Extending Pictorial Structures for Object Recognition , 2004, BMVC.

[220]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[221]  T. Tuytelaars,et al.  Matching Widely Separated Views Based on Affine Invariant Regions , 2004, International Journal of Computer Vision.

[222]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[223]  Peggy Fidelman,et al.  Learning Ball Acquisition on a Physical Robot , 2004 .

[224]  Dieter Fox,et al.  Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[225]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[226]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[227]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[228]  Kristin J. Dana,et al.  3D Texture Recognition Using Bidirectional Feature Histograms , 2004, International Journal of Computer Vision.

[229]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[230]  C. Schmid,et al.  Scale-invariant shape features for recognition of object categories , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[231]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[232]  R. Sukthankar,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[233]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[234]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[235]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[236]  J. Konczak,et al.  The development of goal-directed reaching in infants: hand trajectory formation and joint torque control , 2004, Experimental Brain Research.

[237]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[238]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[239]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[240]  H. Deubel Localization of targets across saccades: Role of landmark objects , 2004 .

[241]  S. Se,et al.  VISION BASED MODELING AND LOCALIZATION FOR PLANETARY EXPLORATION ROVERS , 2004 .

[242]  Stefan Carlsson,et al.  Appearance Based Qualitative Image Description for Object Class Recognition , 2004, ECCV.

[243]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[244]  J Eichhorn,et al.  Object categorization with SVM: kernels for local features , 2004 .

[245]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[246]  Jürgen Schmidhuber,et al.  Fast Online Q(λ) , 1998, Machine Learning.

[247]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[248]  Kevin D. Seppi,et al.  Variable resolution discretization in the joint space , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[249]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[250]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[251]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[252]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[253]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[254]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[255]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[256]  Hermes Senger,et al.  Running Data Mining Applications on the Grid: A Bag-of-Tasks Approach , 2004, ICCSA.

[257]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[258]  Steven Salzberg,et al.  A Teaching Strategy for Memory-Based Control , 1997, Artificial Intelligence Review.

[259]  Stefan Wermter,et al.  Robot docking with neural vision and reinforcement , 2004, Knowl. Based Syst..

[260]  Lucas Paletta,et al.  Attention Architectures for Machine Vision and Mobile Robots , 2005 .

[261]  Justus H. Piater,et al.  Unsupervised Learning of Visual Feature Hierarchies , 2005, MLDM.

[262]  Csaba Szepesvári,et al.  Finite time bounds for sampling based fitted value iteration , 2005, ICML.

[263]  Martin A. Riedmiller Neural reinforcement learning to swing-up and balance a real pole , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[264]  Guillaume Bouchard,et al.  Hierarchical part-based visual object categorization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[265]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[266]  Raphaël Marée,et al.  Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[267]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[268]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[269]  Mehdi Khamassi,et al.  Actor–Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats , 2005, Adapt. Behav..

[270]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[271]  Shimon Ullman,et al.  Feature hierarchies for object classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[272]  Raphaël Marée Classification automatique d'images par arbres de d'ecision , 2005 .

[273]  Sébastien Jodogne,et al.  Controlling an Agent by Focusing its Attention on Interactively Selected Patterns , 2005 .

[274]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[275]  Justus H. Piater,et al.  Object tracking using color interest points , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[276]  Rémi Munos,et al.  Error Bounds for Approximate Value Iteration , 2005, AAAI.

[277]  Justus H. Piater,et al.  Interactive learning of mappings from visual percepts to actions , 2005, ICML.

[278]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[279]  Justus H. Piater,et al.  Task-Driven Learning of Spatial Combinations of Visual Features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[280]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[281]  Marc Van Droogenbroeck,et al.  A VIDEO-BASED HUMAN-COMPUTER INTERACTION SYSTEM FOR AUDIO-VISUAL IMMERSION , 2005 .

[282]  Pierre Wolper,et al.  An effective decision procedure for linear arithmetic over the integers and reals , 2005, TOCL.

[283]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[284]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[285]  J. M. Porta,et al.  Reinforcement Learning for Agents with Many Sensors and Actuators Acting in Categorizable Environments , 2011, J. Artif. Intell. Res..

[286]  Justus H. Piater,et al.  Statistical Learning of Visual Feature Hierarchies , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[287]  J. Piater,et al.  Apprentissage Interactif de Liaisons Directes entre Perceptions Visuelles et Actions , 2005 .

[288]  Matthew B. Blaschko,et al.  Combining Local and Global Image Features for Object Class Recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[289]  S. Jodogne Learning , then Compacting Visual Policies ( Extended Abstract ) , 2005 .

[290]  Justus H. Piater,et al.  Reinforcement Learning of Perceptual Classes using Q Learning Updates , 2005, Artificial Intelligence and Applications.

[291]  Lucas Paletta,et al.  Q-learning of sequential attention for visual object recognition from informative local descriptors , 2005, ICML.

[292]  Stepán Obdrzálek,et al.  Sub-linear Indexing for Large Scale Object Recognition , 2005, BMVC.

[293]  Rémi Munos,et al.  Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[294]  Tomás Martínez-Marín,et al.  Fast Reinforcement Learning for Vision-guided Mobile Robots , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[295]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[296]  Marco Wiering QV(λ)-learning: A New On-policy Reinforcement Learning Algorithm , 2005 .

[297]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[298]  Marc Van Droogenbroeck,et al.  Robust Analysis of Silhouettes by Morphological Size Distributions , 2006, ACIVS.

[299]  F. Scalzo,et al.  Unsupervised Learning of Dense Hierarchical Appearance Representations , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[300]  Louis Wehenkel,et al.  Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[301]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[302]  M. Sugisaka,et al.  Direct-vision-based reinforcement learning in a real mobile robot , 2006, Artificial Life and Robotics.

[303]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[304]  Artur Arsenio,et al.  Reinforcing robot perception of multi-modal events through repetition and redundancy and repetition and redundancy , 2006 .

[305]  Justus H. Piater,et al.  Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks , 2006, ECML.

[306]  Justus H. Piater,et al.  Task-Driven Discretization of the Joint Space of Visual Percepts and Continuous Actions , 2006, ECML.

[307]  Daniel P. Huttenlocher,et al.  Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[308]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[309]  Cyril Briquet,et al.  What is the Grid ? Tentative Definitions Beyond Resource Coordination , 2006 .

[310]  H. Robbins A Stochastic Approximation Method , 1951 .

[311]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .