Continuous-time MAXQ Algorithm for Web Service Composition

Web services composition present a technology to compose complex service applications from individual (atomic) services, that is, through web services composition, distributed applications and enterprise business processes can be integrated by individual service components developed independently. In this paper, we concentrate on the optimization problems of dynamic web service composition, and our goal is to find an optimal composite policy. Different from many traditional composite methods that do not scale to large continuous-time processes, we introduce a hierarchical reinforcement learning technique, i.e., a continuous-time unified MAXQ algorithm, to solve large-scale web service composition problems in the context of continuous-time semi-Markov decision process (SMDP) model under either average- or discounted-cost criteria. The proposed algorithm can avoid the “curse of modeling” and the “curse of dimensionality” existing in the optimization process. Finally, we use a travel reservation as an example to illustrate the high effectiveness of the proposed algorithm, and the simulation results show that, it has better optimization performance and faster learning speed than the flat Q-learning.

[1]  Abhijit Gosavi,et al.  Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .

[2]  James A. Hendler,et al.  HTN planning for Web Service composition using SHOP2 , 2004, J. Web Semant..

[3]  Xiaomeng Su,et al.  A Survey of Automated Web Service Composition Methods , 2004, SWSWPC.

[4]  Shiwei Tang,et al.  Web Service Composition Using Markov Decision Processes , 2005, WAIM.

[5]  Faisal Mustafa,et al.  Dynamic Web Service Composition , 2009, 2009 International Conference on Computer Engineering and Technology.

[6]  M. Younus Javed,et al.  QoS Based Dynamic Web Services Composition & Execution , 2010, ArXiv.

[7]  Stephan Reiff-Marganiec,et al.  Markov-HTN Planning Approach to Enhance Flexibility of Automatic Web Service Composition , 2009, 2009 IEEE International Conference on Web Services.

[8]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[9]  Hongbing Wang,et al.  Preference-Aware Web Service Composition Using Hierarchical Reinforcement Learning , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[10]  Sridhar Mahadevan,et al.  Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.

[11]  Tang Hao,et al.  Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration , 2009, Int. J. Control.

[12]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[13]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[14]  Hongbing Wang,et al.  RLPLA: A Reinforcement Learning Algorithm of Web Service Composition with Preference Consideration , 2008, 2008 IEEE Congress on Services Part II (services-2 2008).

[15]  Prashant Doshi,et al.  A Hierarchical Framework for Composing Nested Web Processes , 2006, ICSOC.

[16]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[17]  P. Doshi,et al.  Composing Nested Web Processes Using Hierarchical Semi-Markov Decision Processes , 2006 .

[18]  Xi-Ren Cao,et al.  Semi-Markov decision problems and performance sensitivity analysis , 2003, IEEE Trans. Autom. Control..

[19]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.