论文信息 - Closed-Loop Learning of Visual Control Policies

Closed-Loop Learning of Visual Control Policies

In this paper we present a general, flexible framework for learning mappings from images to actions by interacting with the environment. The basic idea is to introduce a feature-based image classifier in front of a reinforcement learning algorithm. The classifier partitions the visual space according to the presence or absence of few highly informative local descriptors that are incrementally selected in a sequence of attempts to remove perceptual aliasing. We also address the problem of fighting overfitting in such a greedy algorithm. Finally, we show how high-level visual features can be generated when the power of local descriptors is insufficient for completely disambiguating the aliased states. This is done by building a hierarchy of composite features that consist of recursive spatial combinations of visual features. We demonstrate the efficacy of our algorithms by solving three visual navigation tasks and a visual version of the classical "Car on the Hill" control problem.

Justus H. Piater | Sébastien Jodogne | J. Piater | S. Jodogne

[1] Claude E. Shannon,et al. The synthesis of two-terminal switching circuits , 1949, Bell Syst. Tech. J..

[2] Abraham Wald,et al. Statistical Decision Functions , 1951 .

[3] R. Bellman. A Markovian Decision Process , 1957 .

[4] R. Duncan Luce,et al. Individual Choice Behavior , 1959 .

[5] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .

[6] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[7] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..

[8] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[9] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[10] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[11] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[12] Steven A. Lippman,et al. Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[13] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[14] R. Serfozo. An Equivalence between Continuous and Discrete Time Markov Decision Processes. , 1976 .

[15] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[17] Paul Beaudet,et al. Rotationally invariant image operators , 1978 .

[18] Hans P. Moravec. Obstacle avoidance and navigation in the real world by a seeing robot rover , 1980 .

[19] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[20] Jean Serra,et al. Image Analysis and Mathematical Morphology , 1983 .

[21] E. Gibson,et al. The development of perception , 1983 .

[22] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[23] Peter Allen. Surface descriptions from vision and touch , 1984, ICRA.

[24] W. Eric L. Grimson,et al. Model-based recognition and localization from tactile data , 1984, ICRA.

[25] Hanan Samet,et al. The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[26] Randal E. Bryant,et al. Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[27] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[28] John F. Canny,et al. A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] L Sirovich,et al. Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[30] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .

[31] R A Young,et al. The Gaussian derivative model for spatial vision: I. Retinal mechanisms. , 1988, Spatial vision.

[32] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[33] R. Bajcsy. Active perception , 1988, Proc. IEEE.

[34] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[35] C. Watkins. Learning from delayed rewards , 1989 .

[36] John S. Bridle,et al. Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[37] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[38] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[39] Yiannis Aloimonos,et al. Purposive and qualitative active vision , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[40] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[41] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[42] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[43] J. Urgen Schmidhuber,et al. Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[44] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[45] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.

[46] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[47] Edward H. Adelson,et al. The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[48] M. Turk,et al. Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[49] Richard Kergen,et al. Computerized Control of the Blankholder Pressure on Deep Drawing Presses , 1992 .

[50] Dana H. Ballard,et al. Principles of animate vision , 1992, CVGIP Image Underst..

[51] Randal E. Bryant,et al. Symbolic Boolean manipulation with ordered binary-decision diagrams , 1992, CSUR.

[52] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[53] D. O'Leary. Development of connectional diversity and specificity in the mammalian brain by the pruning of collateral projections , 1992, Current Opinion in Neurobiology.

[54] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[55] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[56] Max A. Viergever,et al. General Intensity Transformations and Second Order Invariants , 1992 .

[57] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[58] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[59] Marcos Salganicoff,et al. Density-Adaptive Learning and Forgetting , 1993, ICML.

[60] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[61] Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.

[62] Enrico Macii,et al. Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[63] Christos Faloutsos,et al. QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[64] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[65] Gérard G. Medioni,et al. Finding Waldo, or focus of attention using local color information , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[66] T. Sejnowski,et al. A critique of pure vision , 1993 .

[67] Hiroshi Murase,et al. Learning, positioning, and tracking visual appearance , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[68] Robert J. Plemmons,et al. Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[69] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[70] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[71] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[72] Minoru Asada,et al. Vision-Based Behavior Acquisition For A Shooting Robot By Using A Reinforcement Learning , 1994 .

[73] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[74] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[75] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[76] John K. Tsotsos. There is no one way to look at vision , 1994 .

[77] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[78] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[79] Nicholas Kushmerick,et al. An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[80] Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.

[81] Shih-Fu Chang,et al. Single color extraction and image query , 1995, Proceedings., International Conference on Image Processing.

[82] Yiannis Aloimonos,et al. Vision and action , 1995, Image Vis. Comput..

[83] Carl-Johan Westelius,et al. Focus of attention and gaze control for robot vision , 1995 .

[84] A. Barto,et al. Adaptive Critics and the Basal Ganglia , 1994 .

[85] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .

[86] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[87] IMAG-LIFIA,et al. Experimental Comparison of Correlation Techniques , 1995 .

[88] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[89] Pawea Cichosz. Truncating Temporal Diierences: on the Eecient Implementation of Td for Reinforcement Learning , 1995 .

[90] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[91] W. D. Ray,et al. Stochastic Models: An Algorithmic Approach , 1995 .

[92] Pietro Perona,et al. Recognition of planar object classes , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[93] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[94] Yann LeCun,et al. Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[95] B. S. Manjunath,et al. Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[96] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[97] Bernt Schiele,et al. Object Recognition Using Multidimensional Receptive Field Histograms , 1996, ECCV.

[98] Juyang Weng,et al. Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[99] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .

[100] Sameer A. Nene,et al. Columbia Object Image Library (COIL100) , 1996 .

[101] Bir Bhanu,et al. Closed-loop object recognition using reinforcement learning , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[102] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[103] Anil K. Jain,et al. Image retrieval using color and shape , 1996, Pattern Recognit..

[104] Stanley J. Rosenschein,et al. Learning to act using real-time dynamic programming , 1996 .

[105] David H. Eberly,et al. Ridges in Image and Data Analysis , 1996, Computational Imaging and Vision.

[106] Yali Amit,et al. Graphical Templates for Model Registration , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[107] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[108] Juyang Weng,et al. Incremental learning for vision-based navigation , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[109] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[110] Charles W. Anderson,et al. Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[111] Katsushi Ikeuchi,et al. Detectability, Uniqueness, and Reliability of Eigen Windows for Stable Verification of Partially Occluded Objects , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[112] Cordelia Schmid,et al. Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[113] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[114] Bernard Boigelot,et al. An Improved Reachability Analysis Method for Strongly Linear Hybrid Systems (Extended Abstract) , 1997, CAV.

[115] P. Schyns,et al. Categorization creates functional features , 1997 .

[116] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[117] J. A. Coelho,et al. A control basis for learning multifingered grasps , 1997, J. Field Robotics.

[118] Andrew J. Davison,et al. Mobile Robot Navigation Using Active Vision , 1998 .

[119] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[120] Roderic A. Grupen,et al. A Control Structure For Learning Locomotion Gaits , 1998 .

[121] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[122] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[123] Massimiliano Pontil,et al. Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[124] Tomaso A. Poggio,et al. A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[125] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[126] H.-M. Gross,et al. A neural field approach to topological reinforcement learning in continuous action spaces , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[127] David A. Forsyth,et al. Finding objects by grouping primitives , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[128] Bernard Boigelot. Symbolic Methods for Exploring Infinite State Spaces , 1998 .

[129] Jonathan Baxter. KnightCap : A chess program that learns by combining TD ( ) with game-tree search , 1998 .

[130] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[131] Rachid Deriche,et al. Differential invariants for color images , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[132] Pierre Wolper,et al. On the Expressiveness of Real and Integer Arithmetic Automata (Extended Abstract) , 1998, ICALP.

[133] Hayit Greenspan,et al. Color- and Texture-based Image Segmentation Using the Expectation-Maximization Algorithm and its Application to Content-Based Image Retrieval. , 1998, ICCV 1998.

[134] James L. Crowley,et al. Visual Recognition Using Local Appearance , 1998, ECCV.

[135] Larry D. Pyeatt,et al. Decision Tree Function Approximation in Reinforcement Learning , 1999 .

[136] David Wettergreen,et al. Autonomous Guidance and Control for an Underwater Robotic Vehicle , 1999 .

[137] Olga Veksler,et al. Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[138] Andrew W. Moore,et al. Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[139] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[140] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[141] Minoru Asada,et al. Continuous valued Q-learning for vision-guided behavior acquisition , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[142] Joachim M. Buhmann,et al. Empirical evaluation of dissimilarity measures for color and texture , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[143] Trygve Randen,et al. Filtering for Texture Classification: A Comparative Study , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[144] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[145] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[146] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[147] Jitendra Malik,et al. Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[148] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .

[149] José Santos-Victor,et al. Omni-directional Visual Navigation , 1999 .

[150] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[151] W. Eric L. Grimson,et al. Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[152] Junichiro Yoshimoto,et al. Application of reinforcement learning to balancing of Acrobot , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[153] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[154] G. Delzanno,et al. Symbolic Representation of Upward-Closed Sets , 2000, TACAS.

[155] Andrew W. Moore,et al. A Nonparametric Approach to Noisy and Costly Optimization , 2000, ICML.

[156] Adam Baumberg,et al. Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[157] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[158] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[159] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[160] Narendra Ahuja,et al. Learning to recognize objects , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[161] Lucas Paletta,et al. Active object recognition by view integration and reinforcement learning , 2000, Robotics Auton. Syst..

[162] Luke Fletcher,et al. Reinforcement learning for visual servoing of a mobile robot , 2000 .

[163] Nozha Boujemaa,et al. Object-based queries using color points of interest , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[164] Shigenobu Kobayashi,et al. Reinforcement learning of walking behavior for a four-legged robot , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[165] Cordelia Schmid,et al. Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[166] Jeff G. Schneider,et al. Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[167] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[168] Juan Carlos Pérez-Cortes,et al. Local Representations and a direct Voting Scheme for Face Recognition , 2001, PRIS.

[169] M. E. McCarty,et al. How infants use vision for grasping objects. , 2001, Child development.

[170] Justus H. Piater,et al. Developing haptic and visual perceptual categories for reaching and grasping with a humanoid robot , 2001, Robotics Auton. Syst..

[171] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .

[172] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[173] V. Kvasnicka,et al. Neural and Adaptive Systems: Fundamentals Through Simulations , 2001, IEEE Trans. Neural Networks.

[174] Peter Stone,et al. Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[175] Pierre Wolper,et al. On the Use of Weak Automata for Deciding Linear Arithmetic with Integer and Real Variables , 2001, IJCAR.

[176] Stepán Obdrzálek,et al. Local Affine Frames for Image Retrieval , 2002, CIVR.

[177] Katsunari Shibata,et al. Application of direct-vision-based reinforcement learning to a real mobile robot , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[178] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[179] Justus Piater,et al. Learning Appearance Features to Support Robotic Manipulation , 2002 .

[180] Jean Ponce,et al. Computer Vision: A Modern Approach , 2002 .

[181] Valérie Gouet,et al. About optimal use of color points of interest for content-based image retrieval , 2002 .

[182] Heiko Wersing,et al. Unsupervised Learning of Combination Features for Hierarchical Recognition Models , 2002, ICANN.

[183] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.

[184] Cordelia Schmid,et al. An Affine Invariant Interest Point Detector , 2002, ECCV.

[185] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[186] Leslie Pack Kaelbling,et al. Reinforcement Learning by Policy Search , 2002 .

[187] Sébastien Jodogne,et al. Automata-based Representations for the Verification of Hybrid Systems , 2002 .

[188] Chris Gaskett,et al. Q-Learning for Robot Control , 2002 .

[189] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[190] Peng-Yeng Yin,et al. Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization , 2002, Signal Process..

[191] Leslie Pack Kaelbling,et al. Making Reinforcement Learning Work on Real Robots , 2002 .

[192] Geoffrey E. Hinton,et al. Reinforcement learning for factored Markov decision processes , 2002 .

[193] Jitendra Malik,et al. Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[194] William T. Freeman,et al. Nonparametric belief propagation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[195] Luc Van Gool,et al. HPAT Indexing for Fast Object/Scene Recognition Based on Local Appearance , 2003, CIVR.

[196] Doina Precup,et al. Using MDP Characteristics to Guide Exploration in Reinforcement Learning , 2003, ECML.

[197] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[198] Francisco Brasileiro,et al. Grid Computing for Bag of Tasks Applications , 2003 .

[199] Bernt Schiele,et al. Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[200] M. Tarr,et al. Learning to see faces and objects , 2003, Trends in Cognitive Sciences.

[201] Florentin Wörgötter,et al. Isotropic Sequence Order Learning , 2003, Neural Computation.

[202] D. Koller,et al. Planning under uncertainty in complex structured environments , 2003 .

[203] Rosaleen J. Anderson. Near optimal closed-loop control Application to electric power systems , 2003 .

[204] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.

[205] Katsunari Shibata,et al. Acquisition of box pushing by direct-vision-based reinforcement learning , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[206] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[207] Andrew Zisserman,et al. Texture classification: are filter banks necessary? , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[208] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[209] B. Cohen,et al. Incentives Build Robustness in Bit-Torrent , 2003 .

[210] Samuel W. Hasinoff,et al. Reinforcement Learning for Problems with Hidden State , 2003 .

[211] Pierre Wolper,et al. An Effective Decision Procedure for Linear Arithmetic with Integer and Real Variables , 2003, ArXiv.

[212] Rémi Coulom,et al. A Model-Based Actor-Critic Algorithm in Continuous Time and Space , 2003 .

[213] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[214] Ilan Shimshoni,et al. Mean shift based clustering in high dimensions: a texture classification example , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[215] Pierre Geurts,et al. Iteratively Extending Time Horizon Reinforcement Learning , 2003, ECML.

[216] R. V. van Nieuwpoort,et al. The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[217] Sébastien Jodogne,et al. Hybrid Acceleration Using Real Vector Automata (Extended Abstract) , 2003, CAV.

[218] J. Koenderink,et al. Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[219] Andrew Zisserman,et al. Extending Pictorial Structures for Object Recognition , 2004, BMVC.

[220] Cordelia Schmid,et al. Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[221] T. Tuytelaars,et al. Matching Widely Separated Views Based on Affine Invariant Regions , 2004, International Journal of Computer Vision.

[222] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.

[223] Peggy Fidelman,et al. Learning Ball Acquisition on a Physical Robot , 2004 .

[224] Dieter Fox,et al. Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[225] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[226] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[227] Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[228] Kristin J. Dana,et al. 3D Texture Recognition Using Bidirectional Feature Histograms , 2004, International Journal of Computer Vision.

[229] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[230] C. Schmid,et al. Scale-invariant shape features for recognition of object categories , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[231] Andrew Zisserman,et al. An Affine Invariant Salient Region Detector , 2004, ECCV.

[232] R. Sukthankar,et al. PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[233] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[234] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.

[235] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[236] J. Konczak,et al. The development of goal-directed reaching in infants: hand trajectory formation and joint torque control , 2004, Experimental Brain Research.

[237] Andrew W. Moore,et al. An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[238] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[239] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[240] H. Deubel. Localization of targets across saccades: Role of landmark objects , 2004 .

[241] S. Se,et al. VISION BASED MODELING AND LOCALIZATION FOR PLANETARY EXPLORATION ROVERS , 2004 .

[242] Stefan Carlsson,et al. Appearance Based Qualitative Image Description for Object Class Recognition , 2004, ECCV.

[243] Tony Lindeberg,et al. Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[244] J Eichhorn,et al. Object categorization with SVM: kernels for local features , 2004 .

[245] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[246] Jürgen Schmidhuber,et al. Fast Online Q(λ) , 1998, Machine Learning.

[247] Bernt Schiele,et al. Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[248] Kevin D. Seppi,et al. Variable resolution discretization in the joint space , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[249] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[250] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[251] Jitendra Malik,et al. When is scene identification just texture recognition? , 2004, Vision Research.

[252] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[253] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[254] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[255] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[256] Hermes Senger,et al. Running Data Mining Applications on the Grid: A Bag-of-Tasks Approach , 2004, ICCSA.

[257] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.

[258] Steven Salzberg,et al. A Teaching Strategy for Memory-Based Control , 1997, Artificial Intelligence Review.

[259] Stefan Wermter,et al. Robot docking with neural vision and reinforcement , 2004, Knowl. Based Syst..

[260] Lucas Paletta,et al. Attention Architectures for Machine Vision and Mobile Robots , 2005 .

[261] Justus H. Piater,et al. Unsupervised Learning of Visual Feature Hierarchies , 2005, MLDM.

[262] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.

[263] Martin A. Riedmiller. Neural reinforcement learning to swing-up and balance a real pole , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[264] Guillaume Bouchard,et al. Hierarchical part-based visual object categorization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[265] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[266] Raphaël Marée,et al. Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[267] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[268] Ashutosh Saxena,et al. High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[269] Mehdi Khamassi,et al. Actor–Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats , 2005, Adapt. Behav..

[270] Cordelia Schmid,et al. A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[271] Shimon Ullman,et al. Feature hierarchies for object classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[272] Raphaël Marée. Classification automatique d'images par arbres de d'ecision , 2005 .

[273] Sébastien Jodogne,et al. Controlling an Agent by Focusing its Attention on Interactively Selected Patterns , 2005 .

[274] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.

[275] Justus H. Piater,et al. Object tracking using color interest points , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[276] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.

[277] Justus H. Piater,et al. Interactive learning of mappings from visual percepts to actions , 2005, ICML.

[278] Cordelia Schmid,et al. A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[279] Justus H. Piater,et al. Task-Driven Learning of Spatial Combinations of Visual Features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[280] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[281] Marc Van Droogenbroeck,et al. A VIDEO-BASED HUMAN-COMPUTER INTERACTION SYSTEM FOR AUDIO-VISUAL IMMERSION , 2005 .

[282] Pierre Wolper,et al. An effective decision procedure for linear arithmetic over the integers and reals , 2005, TOCL.

[283] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[284] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[285] J. M. Porta,et al. Reinforcement Learning for Agents with Many Sensors and Actuators Acting in Categorizable Environments , 2011, J. Artif. Intell. Res..

[286] Justus H. Piater,et al. Statistical Learning of Visual Feature Hierarchies , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[287] J. Piater,et al. Apprentissage Interactif de Liaisons Directes entre Perceptions Visuelles et Actions , 2005 .

[288] Matthew B. Blaschko,et al. Combining Local and Global Image Features for Object Class Recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[289] S. Jodogne. Learning , then Compacting Visual Policies ( Extended Abstract ) , 2005 .

[290] Justus H. Piater,et al. Reinforcement Learning of Perceptual Classes using Q Learning Updates , 2005, Artificial Intelligence and Applications.

[291] Lucas Paletta,et al. Q-learning of sequential attention for visual object recognition from informative local descriptors , 2005, ICML.

[292] Stepán Obdrzálek,et al. Sub-linear Indexing for Large Scale Object Recognition , 2005, BMVC.

[293] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[294] Tomás Martínez-Marín,et al. Fast Reinforcement Learning for Vision-guided Mobile Robots , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[295] Hiroshi Murase,et al. Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[296] Marco Wiering. QV(λ)-learning: A New On-policy Reinforcement Learning Algorithm , 2005 .

[297] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[298] Marc Van Droogenbroeck,et al. Robust Analysis of Silhouettes by Morphological Size Distributions , 2006, ACIVS.

[299] F. Scalzo,et al. Unsupervised Learning of Dense Hierarchical Appearance Representations , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[300] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[301] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[302] M. Sugisaka,et al. Direct-vision-based reinforcement learning in a real mobile robot , 2006, Artificial Life and Robotics.

[303] Vincent Lepetit,et al. Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[304] Artur Arsenio,et al. Reinforcing robot perception of multi-modal events through repetition and redundancy and repetition and redundancy , 2006 .

[305] Justus H. Piater,et al. Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks , 2006, ECML.

[306] Justus H. Piater,et al. Task-Driven Discretization of the Joint Space of Visual Percepts and Continuous Actions , 2006, ECML.

[307] Daniel P. Huttenlocher,et al. Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[308] Cordelia Schmid,et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[309] Cyril Briquet,et al. What is the Grid ? Tentative Definitions Beyond Resource Coordination , 2006 .

[310] H. Robbins. A Stochastic Approximation Method , 1951 .

[311] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .