Learning Sorting and Decision Trees with POMDPs

pomdps are general models of sequential decisions in which both actions and observations can be probabilistic. Many problems of interest can be formulated as pomdps, yet the use of pomdps has been limited by the lack of eeective algorithms. Recently this has started to change and a number of problems such as robot navigation and planning are beginning to be formulated and solved as pomdps. The advantage of the pomdp approach is its clean semantics and its ability to produce principled solutions that integrate physical and information gathering actions. In this paper we pursue this approach in the context of two learning tasks: learning to sort a vector of numbers and learning decision trees from data. Both problems are formulated as pomdps and solved by a general pomdp algorithm. The main lessons and results are that 1) the use of suitable heuris-tics and representations allows for the solution of sorting and classiication pomdps of non-trivial sizes, 2) the quality of the resulting solutions are competitive with the best algorithms, and 3) problematic aspects in decision tree learning such as test and mis-classiication costs, noisy tests, and missing values are naturally accommodated.

[1]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[2]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[3]  Jaime G. Carbonell,et al.  Machine learning research , 1981, SGAR.

[4]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[9]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[10]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[13]  Stanley J. Rosenschein,et al.  Learning to act using real-time dynamic programming , 1996 .

[14]  T. Dean,et al.  Planning under uncertainty: structural assumptions and computational leverage , 1996 .

[15]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[16]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[17]  John Beidler,et al.  Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.

[18]  Hector Geffner,et al.  Solving Large POMDPs using Real Time Dynamic Programming , 1998 .

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .