论文信息 - Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks

Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks

Approximate Policy Iteration (API) is a reinforcement learning paradigm that is able to solve high-dimensional, continuous control problems. We propose to exploit API for the closed-loop learning of mappings from images to actions. This approach requires a family of function approximators that maps visual percepts to a real-valued function. For this purpose, we use Regression Extra-Trees, a fast, yet accurate and versatile machine learning algorithm. The inputs of the Extra-Trees consist of a set of visual features that digest the informative patterns in the visual signal. We also show how to parallelize the Extra-Tree learning process to further reduce the computational expense, which is often essential in visual tasks. Experimental results on real-world images are given that indicate that the combination of API with Extra-Trees is a promising framework for the interactive learning of visual tasks.

[1] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[3] Justus H. Piater,et al. Interactive learning of mappings from visual percepts to actions , 2005, ICML.

[4] S. Jodogne. Learning , then Compacting Visual Policies ( Extended Abstract ) , 2005 .

[5] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[8] Justus H. Piater,et al. Task-Driven Learning of Spatial Combinations of Visual Features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[9] Cordelia Schmid,et al. Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[10] B. Cohen,et al. Incentives Build Robustness in Bit-Torrent , 2003 .

[11] Raphaël Marée,et al. Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[13] Cordelia Schmid,et al. A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..