Understanding Sequential Decisions via Inverse Reinforcement Learning

The execution of an agent's complex activities, comprising sequences of simpler actions, sometimes leads to the clash of conflicting functions that must be optimized. These functions represent satisfaction, short-term as well as long-term objectives, costs and individual preferences. The way that these functions are weighted is usually unknown even to the decision maker. But if we were able to understand the individual motivations and compare such motivations among individuals, then we would be able to actively change the environment so as to increase satisfaction and/or improve performance. In this work, we approach the problem of providing highlevel and intelligible descriptions of the motivations of an agent, based on observations of such an agent during the fulfillment of a series of complex activities (called sequential decisions in our work). A novel algorithm for the analysis of observational records is proposed. We also present a methodology that allows researchers to converge towards a summary description of an agent's behaviors, through the minimization of an error measure between the current description and the observed behaviors. This work was validated using not only a synthetic dataset representing the motivations of a passenger in a public transportation network, but also real taxi drivers' behaviors from their trips in an urban network. Our results show that our method is not only useful, but also performs much better than the previous methods, in terms of accuracy, efficiency and scalability.

[1]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[2]  Anind K. Dey,et al.  Probabilistic pointing target prediction via inverse optimal control , 2012, IUI '12.

[3]  Er Meng Joo,et al.  A review of inverse reinforcement learning theory and recent advances , 2012, IEEE Congress on Evolutionary Computation.

[4]  Christos Dimitrakakis,et al.  Preference elicitation and inverse reinforcement learning , 2011, ECML/PKDD.

[5]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[6]  Miguel Araújo,et al.  Understanding Behavior via Inverse Reinforcement Learning , 2012 .

[7]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[8]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9]  Enhong Chen,et al.  A habit mining approach for discovering similar mobile users , 2012, WWW.

[10]  Siyuan Liu,et al.  Detecting Crowdedness Spot in City Transportation , 2013, IEEE Transactions on Vehicular Technology.

[11]  Siyuan Liu,et al.  Towards mobility-based clustering , 2010, KDD.

[12]  Chinya V. Ravishankar,et al.  Finding Regions of Interest from Trajectory Data , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[13]  R. Bellman A Markovian Decision Process , 1957 .

[14]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[15]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[16]  Hui Xiong,et al.  BP-growth: Searching Strategies for Efficient Behavior Pattern Mining , 2012, 2012 IEEE 13th International Conference on Mobile Data Management.

[17]  Micha Sharir,et al.  A subexponential bound for linear programming , 1992, SCG '92.

[18]  Anind K. Dey,et al.  Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[19]  Yanying Li,et al.  Link travel time estimation using single GPS equipped probe vehicle , 2002, Proceedings. The IEEE 5th International Conference on Intelligent Transportation Systems.

[20]  Karen Zita Haigh,et al.  Learning Models of Human Behaviour with Sequential Patterns , 2002 .

[21]  H. Farber Reference-Dependent Preferences and Labor Supply: The Case of New York City Taxi Drivers , 2008 .

[22]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[23]  Ramayya Krishnan,et al.  Calibrating Large Scale Vehicle Trajectory Data , 2012, 2012 IEEE 13th International Conference on Mobile Data Management.

[24]  Xing Xie,et al.  Learning Location Correlation from GPS Trajectories , 2010, 2010 Eleventh International Conference on Mobile Data Management.

[25]  F. Eisenbrand,et al.  Integer Programming, Lattices, and Results in Fixed Dimension , 2004 .

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[28]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[29]  Xing Xie,et al.  Where to find my next passenger , 2011, UbiComp '11.

[30]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[31]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[32]  Er Meng Joo,et al.  A survey of inverse reinforcement learning techniques , 2012 .

[33]  Wen-Chih Peng,et al.  Exploring Spatial-Temporal Trajectory Model for Location Prediction , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.