Integrating POMDP and SARSA( \lambda λ ) for Service Composition with Incomplete Information

As a powerful computing paradigm for constructing complex distributed applications, service composition is usually addressed as a planning problem since the goal is to optimize a path for combining services to satisfy special requirements. Some planning methods assume that the state of running environment can be fully observed and monitored. However, the dynamic internet environment and opaque internal status, such as QoS attributes and invoking results, make the assumption too strict and not generally applicable. In this paper, we introduce a Partially Observable Markov Decision Process (POMDP) to model a service composition, which views the environment as partially observable and generates a policy with incomplete information. The partial observability relaxes the previous assumption and can handle the difficulties occurring in a dynamic and unpredictable environment. Based on this model, we propose a reinforcement learning algorithm to compute the optimal strategy. We conduct a series of experiments to verify the proposed algorithm, and compare it the comparison with other two algorithms. The results show the correctness and effectiveness of our algorithm.

[1]  Alvin W Drake,et al.  Observation of a Markov process through a noisy channel , 1962 .

[2]  Mohamed A. Sharaf,et al.  Databases Theory and Applications , 2014, Lecture Notes in Computer Science.

[3]  Xin Chen,et al.  Integrating Gaussian Process with Reinforcement Learning for Adaptive Service Composition , 2015, ICSOC.

[4]  Schahram Dustdar,et al.  A survey on web services composition , 2005, Int. J. Web Grid Serv..

[5]  Hongbing Wang,et al.  Optimal Self-Healing of Service-Oriented Systems with Incomplete Information , 2013, 2013 IEEE International Congress on Big Data.

[6]  Simone A. Ludwig,et al.  Automatic service composition using POMDP and provenance data , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Shiwei Tang,et al.  Web Service Composition Using Markov Decision Processes , 2005, WAIM.

[9]  Zibin Zheng,et al.  Integrating On-policy Reinforcement Learning with Multi-agent Techniques for Adaptive Service Composition , 2014, ICSOC.

[10]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[11]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[12]  Hongbing Wang,et al.  Effective service composition using multi-agent reinforcement learning , 2016, Knowl. Based Syst..

[13]  Boi Faltings,et al.  Optimizing the Tradeoff between Discovery, Composition, and Execution Cost in Service Composition , 2011, 2011 IEEE International Conference on Web Services.

[14]  Bo Yang,et al.  Web Service Composition Based on Reinforcement Learning , 2015, 2015 IEEE International Conference on Web Services.

[15]  Prashant Doshi,et al.  Dynamic workflow composition using Markov decision processes , 2004 .

[16]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[17]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.