A formal proof of the 𝜖-optimality of discretized pursuit algorithms

Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and of these, the family of discretized algorithms are proven to converge even faster than their continuous counterparts. However, it has recently been reported that the previous proofs for 𝜖-optimality for all the reported algorithms for the past three decades have been flawed. We applaud the researchers who discovered this flaw, and who further proceeded to rectify the proof for the Continuous Pursuit Algorithm (CPA). The latter proof examines the monotonicity property of the probability of selecting the optimal action, and requires the learning parameter to be continuously changing. In this paper, we provide a new method to prove the 𝜖-optimality of the Discretized Pursuit Algorithm (DPA) which does not require this constraint, by virtue of the fact that the DPA has, in and of itself, absorbing barriers to which the LA can jump in a discretized manner. Unlike the proof given (Zhang et al., Appl Intell 41:974–985, 3) for an absorbing version of the CPA, which utilizes the single-action Hoeffding’s inequality, the current proof invokes what we shall refer to as the “multi-action” version of the Hoeffding’s inequality. We believe that our proof is both unique and pioneering. It can also form the basis for formally showing the 𝜖-optimality of the other EAs that possess absorbing states.

[1]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[2]  B.J. Oommen,et al.  On discretizing estimator-based learning algorithms , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[3]  P. S. Sastry,et al.  Estimator Algorithms for Learning Automata , 1986 .

[4]  B. John Oommen,et al.  Learning Automata-Based Solutions to the Nonlinear Fractional Knapsack Problem With Applications to Optimal Resource Allocation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  B. John Oommen,et al.  Discretized estimator learning automata , 1992, IEEE Trans. Syst. Man Cybern..

[6]  B. John Oommen,et al.  Continuous and discretized pursuit learning schemes: various algorithms and their comparison , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[7]  Yuguang Fang,et al.  Stochastic Channel Selection in Cognitive Radio Networks , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[8]  B. John Oommen,et al.  Continuous Learning Automata Solutions to the Capacity Assignment Problem , 2000, IEEE Trans. Computers.

[9]  B. John Oommen,et al.  Using Stochastic AI Techniques to Achieve Unbounded Resolution in Finite Player Goore Games and its Applications , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[10]  B. John Oommen,et al.  Generalized pursuit learning schemes: new families of continuous and discretized learning automata , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[11]  B. John Oommen,et al.  Solving Stochastic Nonlinear Resource Allocation Problems Using a Hierarchy of Twofold Resource Allocation Automata , 2010, IEEE Transactions on Computers.

[12]  Hamid Beigy,et al.  Adaptation of parameters of BP algorithm using learning automata , 2000, Proceedings. Vol.1. Sixth Brazilian Symposium on Neural Networks.

[13]  B. John Oommen,et al.  The Bayesian Pursuit Algorithm: A New Family of Estimator Learning Automata , 2011, IEA/AIE.

[14]  Cem Unsal,et al.  Multiple Stochastic Learning Automata for Vehicle Path Control in an Automated Highway System , 1999 .

[15]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[16]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[17]  B. John Oommen,et al.  Using the Theory of Regular Functions to Formally Prove the ε-Optimality of Discretized Pursuit Learning Algorithms , 2014, IEA/AIE.

[18]  B. John Oommen A Learning Automaton Solution to the Stochastic Minimum-Spanning Circle Problem , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[19]  Omkar J. Tilak,et al.  On ε-Optimality of the Pursuit Learning Algorithm , 2012, J. Appl. Probab..

[20]  Omkar J. Tilak,et al.  On epsilon-optimality of the pursuit learning algorithm , 2011, ArXiv.

[21]  B. John Oommen,et al.  A formal proof of the ε-optimality of absorbing continuous pursuit algorithms using the theory of regular functions , 2014, Applied Intelligence.

[22]  Kanagasabai Rajaraman,et al.  Finite time analysis of the pursuit algorithm for learning automata , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[23]  B. Johnoommen Absorbing and Ergodic Discretized Two-Action Learning Automata , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  B. John Oommen,et al.  String taxonomy using learning automata , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[25]  B. John Oommen,et al.  Graph Partitioning Using Learning Automata , 1996, IEEE Trans. Computers.

[26]  B. John Oommen,et al.  On Using the Theory of Regular Functions to Prove the ε-Optimality of the Continuous Pursuit Learning Automaton , 2013, IEA/AIE.

[27]  Pushkin Kachroo,et al.  Multiple stochastic learning automata for vehicle path control in an automated highway system , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[28]  B. John Oommen,et al.  Discretized Bayesian Pursuit - A New Scheme for Reinforcement Learning , 2012, IEA/AIE.

[29]  B. John Oommen,et al.  On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata , 2013, Applied Intelligence.

[30]  B. John Oommen,et al.  Discretized pursuit learning automata , 1990, IEEE Trans. Syst. Man Cybern..

[31]  Leslie Pack Kaelbling,et al.  Inferring finite automata with stochastic output functions and an application to map learning , 1992, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).