Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms

We discuss synchronous and asynchronous iterations of the form xk+1 = x^k + \gamma(k) (h(x^k)+w^k), where h is a suitable map and {wk} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark's lemma for the synchronous case or on Borkar's theorem for the asynchronous case. However, the analysis requires that the iterates {xk} be bounded, a fact which is usually hard to prove. We develop a novel framework for proving boundedness in the deterministic framework, which is also applicable to the stochastic case when the deterministic hypotheses can be verified in the almost sure sense. This is based on scaling ideas and on the properties of Lyapunov functions. We then combine the boundedness property with Borkar's stability analysis of ODEs involving nonexpansive mappings to prove convergence (with probability 1 in the stochastic case). We also apply our convergence analysis to Q-learning algorithms for stochastic shortest path problems and are able to relax some of the assumptions of the currently available results.

[1]  吉沢 太郎 Stability theory by Liapunov's second method , 1966 .

[2]  F. Wilson,et al.  Smoothing derivatives of functions and applications , 1969 .

[3]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.

[4]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[5]  V. Fabian Stochastic Approximation Methods for Constrained and Unconstrained Systems (Harold L. Kushner and Dean S. Clark) , 1980 .

[6]  Y. Kifer Ergodic theory of random transformations , 1986 .

[7]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[8]  D. Bertsekas,et al.  Partially asynchronous, parallel algorithms for network flow and other problems , 1990 .

[9]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[10]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[11]  L. Gerencsér Rate of convergence of recursive estimators , 1992 .

[12]  V. Borkar White-noise representations in stochastic realization theory , 1993 .

[13]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[14]  V. Borkar Probability Theory: An Advanced Course , 1995 .

[15]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[16]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[17]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[18]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[19]  S. Kulkarni,et al.  An alternative proof for convergence of stochastic approximation algorithms , 1996, IEEE Trans. Autom. Control..

[20]  V. Borkar,et al.  An analog scheme for fixed point computation. I. Theory , 1997 .

[21]  V. Borkar Asynchronous Stochastic Approximations , 1998 .

[22]  Vivek S. Borkar,et al.  An analog scheme for fixed-point computation-Part II: Applications , 1999 .