Parameter learning from stochastic teachers and stochastic compulsive liars

This paper considers a general learning problem akin to the field of learning automata (LA) in which the learning mechanism attempts to learn from a stochastic teacher or a stochastic compulsive liar. More specifically, unlike the traditional LA model in which LA attempts to learn the optimal action offered by the Environment (also here called the "Oracle"), this paper considers the problem of the learning mechanism (robot, an LA, or in general, an algorithm) attempting to learn a "parameter" within a closed interval. The problem is modeled as follows: The learning mechanism is trying to locate an unknown point on a real interval by interacting with a stochastic Environment through a series of informed guesses. For each guess, the Environment essentially informs the mechanism, possibly erroneously (i.e., with probability p), which way it should move to reach the unknown point. When the probability of a correct response is p>0.5, the Environment is said to be informative, and thus the case of learning from a stochastic teacher. When this probability p<0.5, the Environment is deemed deceptive, and is called a stochastic compulsive liar. This paper describes a novel learning strategy by which the unknown parameter can be learned in both environments. These results are the first reported results, which are applicable to the latter scenario. The most significant contribution of this paper is that the proposed scheme is shown to operate equally well, even when the learning mechanism is unaware of whether the Environment ("Oracle") is informative or deceptive. The learning strategy proposed herein, called CPL-AdS, partitions the search interval into d subintervals, evaluates the location of the unknown point with respect to these subintervals using fast-converging epsi-optimal LRI LA, and prunes the search space in each iteration by eliminating at least one partition. The CPL-AdS algorithm is shown to provably converge to the unknown point with an arbitrary degree of accuracy with probability as close to unity as desired. Comprehensive experimental results confirm the fast and accurate convergence of the search for a wide range of values for the Environment's feedback accuracy parameter p, and thus has numerous potential applications

[1]  Mohammad S. Obaidat,et al.  Learning automata-based bus arbitration for shared-medium ATM switches , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[2]  Athanasios V. Vasilakos,et al.  The use of learning algorithms in ATM networks call admission control problem: a methodology , 2000, Comput. Networks.

[3]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[4]  Kaddour Najim,et al.  Learning automata and stochastic optimization , 1997 .

[5]  Franciszek Seredynski,et al.  Distributed scheduling using simple learning machines , 1998, Eur. J. Oper. Res..

[6]  B. John Oommen,et al.  Automata learning and intelligent tertiary searching for stochastic point location , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[7]  S. Lakshmivarahan,et al.  Learning Algorithms Theory and Applications , 1981 .

[8]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9]  Mohammad S. Obaidat,et al.  Efficient fast learning automata , 2003, Inf. Sci..

[10]  Theodosios Pavlidis,et al.  Structural pattern recognition , 1977 .

[11]  Witold Pedrycz,et al.  Optimizing QoS routing in hierarchical ATM networks using computational intelligence techniques , 2003, IEEE Trans. Syst. Man Cybern. Part C.

[12]  Singiresu S. Rao,et al.  Optimization Theory and Applications , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[14]  B. John Oommen,et al.  Graph Partitioning Using Learning Automata , 1996, IEEE Trans. Computers.

[15]  Pushkin Kachroo,et al.  Simulation study of multiple intelligent vehicle control using stochastic learning automata , 1997 .

[16]  B. John Oommen,et al.  Continuous Learning Automata Solutions to the Capacity Assignment Problem , 2000, IEEE Trans. Computers.

[17]  B. John Oommen,et al.  Scale Preserving Smoothing of Polygons , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  B. John Oommen,et al.  GPSPA: a new adaptive algorithm for maintaining shortest path routing trees in stochastic networks , 2004, Int. J. Commun. Syst..

[19]  René Schott,et al.  Parallel Searching in the Plane , 1995, Comput. Geom..

[20]  M. Thathachar,et al.  Networks of Learning Automata: Techniques for Online Stochastic Optimization , 2003 .

[21]  M. Agache,et al.  Generalized TSE: a new generalized estimator based learning automaton , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[22]  B. John Oommen,et al.  Continuous and discretized pursuit learning schemes: various algorithms and their comparison , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[23]  Mohammad Reza Meybodi,et al.  New Learning Automata Based Algorithms for Adaptation of Backpropagation Algorithm Parameters , 2002, Int. J. Neural Syst..

[24]  Georgios I. Papadimitriou,et al.  Learning-automata-based TDMA protocols for broadcast communication systems with bursty traffic , 2000, IEEE Communications Letters.

[25]  Philip D. Wasserman,et al.  Neural computing - theory and practice , 1989 .

[26]  Athanasios V. Vasilakos,et al.  The Use of Reinforcement Learning Algorithms in Traffic Control of High Speed Networks , 2002, Advances in Computational Intelligence and Learning.

[27]  P. S. Sastry,et al.  Continuous action set learning automata for stochastic optimization , 1994 .

[28]  B. John Oommen,et al.  Stochastic searching on the line and its applications to parameter learning in nonlinear optimization , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[29]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[30]  Andrew Chi-Chih Yao,et al.  An Almost Optimal Algorithm for Unbounded Searching , 1976, Inf. Process. Lett..

[31]  Ricardo Baeza-Yates,et al.  Searching with uncertainty , 1988 .

[32]  B. John Oommen,et al.  Discretized estimator learning automata , 1992, IEEE Trans. Syst. Man Cybern..

[33]  Mohammad Reza Meybodi,et al.  Applying continuous action reinforcement learning automata(CARLA) to global training of hidden Markov models , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[34]  W. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[35]  B. John Oommen,et al.  Generalized pursuit learning schemes: new families of continuous and discretized learning automata , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[36]  Mohammad S. Obaidat,et al.  Guest editorial learning automata: theory, paradigms, and applications , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[37]  Kaddour Najim,et al.  Learning Automata: Theory and Applications , 1994 .