Higher order Q-Learning

Higher order learning is a statistical relational learning framework in which relationships between different instances of the same class are leveraged (Ganiz, Lytkin and Pottenger, 2009). Learning can be supervised or unsupervised. In contrast, reinforcement learning (Q-Learning) is a technique for learning in an unknown state space. Action selection is often based on a greedy, or epsilon greedy approach. The problem with this approach is that there is often a large amount of initial exploration before convergence. In this article we introduce a novel approach to this problem that treats a state space as a collection of data from which latent information can be extrapolated. From this data, we classify actions as leading to a high reward or low reward, and formulate behaviors based on this information. We provide experimental evidence that this technique drastically reduces the amount of exploration required in the initial stages of learning. We evaluate our algorithm in a well-known reinforcement learning domain, grid-world.

[1]  Andrew G. Barto,et al.  Local Bandit Approximation for Optimal Learning Problems , 1996, NIPS.

[2]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[3]  William M. Pottenger,et al.  Link Analysis of Higher-Order Paths in Supervised Learning Datasets , 2006 .

[4]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[5]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[6]  Jennifer Neville,et al.  Dependency networks for relational data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[7]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[8]  C. Bishop The MIT Encyclopedia of the Cognitive Sciences , 1999 .

[9]  Graham Cormode,et al.  Applying link-based classification to label blogs , 2007, WebKDD/SNA-KDD '07.

[10]  R. B. Bradford Relationship Discovery in Large Text Collections Using Latent Semantic Indexing , 2006 .

[11]  William M. Pottenger,et al.  Higher Order Naïve Bayes: A Novel Non-IID Approach to Text Classification , 2011, IEEE Transactions on Knowledge and Data Engineering.

[12]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[13]  William M. Pottenger,et al.  Mining Higher-Order Association Rules from Distributed Named Entity Databases , 2007, 2007 IEEE Intelligence and Security Informatics.

[14]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[15]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[18]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[19]  R. B. Bradford Application of Latent Semantic Indexing in Generating Graphs of Terrorist Networks , 2006, ISI.

[20]  Karla Gail Conn Supervised-reinforcement learning for a mobile robot in a real-world environment , 2005 .

[21]  William M. Pottenger,et al.  Leveraging Higher Order Dependencies Between Features for Text Classification , 2009 .

[22]  R. B. Bradford Exploiting Sensitive Information in Background Mode using Latent Semantic Indexing , 2008 .

[23]  William M. Pottenger,et al.  A Higher Order Collective Classifier for detecting and classifying network events , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.