Cross-entropic learning of a machine for the decision in a partially observable universe

In this paper, we are interested in optimal decisions in a partially observable universe. Our approach is to directly approximate an optimal strategic tree depending on the observation. This approximation is made by means of a parameterized probabilistic law. A particular family of Hidden Markov Models (HMM), with input and output, is considered as a model of policy. A method for optimizing the parameters of these HMMs is proposed and applied. This optimization is based on the cross-entropic (CE) principle for rare events simulation developed by Rubinstein.

[1]  E. W. Adams,et al.  The logic of conditionals , 1975 .

[2]  Frédéric Dambreville Definition of a Deterministic Bayesian Logic , 2004, ArXiv.

[3]  G. Boole An Investigation of the Laws of Thought: On which are founded the mathematical theories of logic and probabilities , 2007 .

[4]  B. Dahn Foundations of Probability theory, statistical inference, and statistical theories of science , 1978 .

[5]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[7]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[8]  O. Colot,et al.  Identical foundation of probability theory and fuzzy set theory , 2002, Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997).

[9]  Kevin P. Murphy,et al.  Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[10]  Sridhar Mahadevan,et al.  Hierarchical learning and planning in partially observable markov decision processes , 2002 .

[11]  Reuven Y. Rubinstein,et al.  Rare event estimation for static models via cross-entropy and importance sampling , 2003 .

[12]  Rachel Anne Bourne,et al.  Default reasoning using maximum entropy and variable strength defaults , 1999 .

[13]  Alan Hájek,et al.  What Conditional Probability Could Not Be , 2003, Synthese.

[14]  J. Heijenoort From Frege to Gödel: A Source Book in Mathematical Logic, 1879-1931 , 1967 .

[15]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[16]  Laura Giordano,et al.  Conditional logic of actions and causation , 2004, Artif. Intell..

[17]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[18]  T. Brehard,et al.  Hierarchical particle filter for bearings-only tracking , 2007, IEEE Transactions on Aerospace and Electronic Systems.

[19]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[20]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[21]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[22]  Laura Giordano,et al.  A Conditional Logic for Iterated Belief Revision , 2000, ECAI.

[23]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[24]  Philip G. Calabrese,et al.  An algebraic synthesis of the foundations of logic and probability , 1987, Inf. Sci..

[25]  Bram Bakker,et al.  Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .

[26]  R. T. Cox,et al.  The Algebra of Probable Inference , 1962 .

[27]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[28]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[29]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[30]  I. Graham Non-standard logics for automated reasoning , 1990 .

[31]  C. L. Hamblin,et al.  Australian Journal of Philosophy , 1963 .

[32]  R. Bellman Dynamic programming. , 1957, Science.

[33]  Robert Stalnaker Probability and Conditionals , 1970, Philosophy of Science.

[34]  Nils J. Nilsson,et al.  Probabilistic Logic * , 2022 .

[35]  Florentin Smarandache,et al.  Advances and Applications of DSmT for Information Fusion , 2004 .

[36]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[37]  D. Lewis Probabilities of Conditionals and Conditional Probabilities , 1976 .

[38]  I. R. Goodman,et al.  Mathematics of Data Fusion , 1997 .

[39]  Joseph Y. Halpern A Counterexample to Theorems of Cox and Fine , 1996, AAAI/IAAI, Vol. 2.

[40]  Thomas Lukasiewicz,et al.  Probabilistic Logic under Coherence, Model-Theoretic Probabilistic Logic, and Default Reasoning , 2001, ECSQARU.

[41]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[42]  Christian Musso,et al.  Improving Regularised Particle Filters , 2001, Sequential Monte Carlo Methods in Practice.