Personalizing influence diagrams: applying online learning strategies to dialogue management

We consider the problem of adapting the parameters of an influence diagram in an online fashion for real-time personalization. This problem is important when we use the influence diagram repeatedly to make decisions and we are uncertain about its parameters. We describe learning algorithms to solve this problem. In particular, we show how to modify various explore-versus-exploit strategies that are known to work well for Markov decision processes to the more general influence-diagram model. As an illustration, we describe how our techniques for online personalization allow a voice-enabled browser to adapt to a particular speaker for spoken dialogue management. We evaluate all the explore-versus-exploit strategies in this domain.

[1]  Ralph Arnote,et al.  Hong Kong (China) , 1996, OECD/G20 Base Erosion and Profit Shifting Project.

[2]  Ingrid Zukerman,et al.  Bayesian Models for Keyhole Plan Recognition in an Adventure Game , 2004, User Modeling and User-Adapted Interaction.

[3]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[6]  Ross D. Shachter,et al.  Dynamic programming and influence diagrams , 1990, IEEE Trans. Syst. Man Cybern..

[7]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[8]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[9]  Ross D. Shachter,et al.  Decision Making Using Probabilistic Inference Methods , 1992, UAI.

[10]  Sham M. Kakade,et al.  Online Bounds for Bayesian Algorithms , 2004, NIPS.

[11]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[12]  Eric Horvitz,et al.  The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users , 1998, UAI.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Jeremy Wyatt,et al.  Exploration and inference in learning from reinforcement , 1998 .

[15]  Gregory F. Cooper,et al.  A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.

[16]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[17]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[18]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[19]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[20]  Ingrid Zukerman,et al.  # 2001 Kluwer Academic Publishers. Printed in the Netherlands. Predictive Statistical Models for User Modeling , 1999 .

[21]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[22]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[23]  Stephen Young Probabilistic methods in spoken–dialogue systems , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[24]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[25]  Steffen L. Lauritzen,et al.  Representing and Solving Decision Problems with Limited Information , 2001, Manag. Sci..