Exploiting multicores to optimize business process execution

While modern CPUs offer an increasing number of cores with shared caches, prevailing execution engines for business processes, workflows, or Web service compositions have not been optimized for properly exploiting the abundant processing resources of such CPUs. One factor limiting performance is the inefficient thread scheduling by the operating system, which can result in suboptimal use of shared caches. In this paper we study performance of the JOpera business process execution engine on a recent multicore machine. By analyzing the engine's architecture and by binding threads that are likely to access shared data to cores with a common cache, we achieve speedups up to 13% for a variety of workloads, without modifying the engine's architecture and implementation, apart from binding threads to CPUs. As the engine is implemented in Java, we provide a new Java library to manage thread bindings and hardware performance counters. We also leverage hardware performance counters to explain the observed speedup in our performance analysis.

[1]  James Snell,et al.  Introduction to Web services architecture , 2002, IBM Syst. J..

[2]  Xipeng Shen,et al.  Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.

[3]  Geoffrey C. Fox,et al.  Performance of Multicore Systems on Parallel Datamining Services , 2008 .

[4]  Ling Shao,et al.  Allocation wall: a limiting factor of Java applications on emerging multi-core platforms , 2009, OOPSLA.

[5]  Kwei-Jay Lin,et al.  Service Monitoring and Management on Multicore Platforms , 2006, 2006 IEEE International Conference on e-Business Engineering (ICEBE'06).

[6]  Michael Stumm,et al.  Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.

[7]  Schahram Dustdar,et al.  A survey on web services composition , 2005, Int. J. Web Grid Serv..

[8]  Cesare Pautasso,et al.  JOpera: A Toolkit for Efficient Visual Composition of Web Services , 2005, Int. J. Electron. Commer..

[9]  Bernd Freisleben,et al.  A scalable service-oriented architecture for multimedia analysis, synthesis and consumption , 2009, Int. J. Web Grid Serv..

[10]  Luciano Baresi,et al.  Toward Open-World Software: Issue and Challenges , 2006, Computer.

[11]  Michael Ott,et al.  autopin - Automated Optimization of Thread-to-Core Pinning on Multicore Systems , 2011, Trans. High Perform. Embed. Archit. Compil..

[12]  Alexandra Fedorova,et al.  Performance Implications of Cache Affinity on Multicore Processors , 2008, Euro-Par.

[13]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[14]  Peter F. Sweeney,et al.  Understanding the cost of thread migration for multi-threaded Java applications running on a multicore platform , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[15]  Heiko Schuldt,et al.  ISIS and OSIRIS: A Process-Based Digital Library Application on Top of a Distributed Process Support Middleware , 2007, DELOS.

[16]  Mathias Weske,et al.  Business Process Management: Concepts, Languages, Architectures , 2007 .

[17]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[18]  Hans-Arno Jacobsen,et al.  A distributed service-oriented architecture for business process execution , 2010, TWEB.

[19]  Wei Lu,et al.  Developing a concurrent service orchestration engine in ccr , 2008, IWMSE '08.

[20]  Thomas Heinis,et al.  Autonomic resource provisioning for software business processes , 2007, Inf. Softw. Technol..

[21]  Yiannakis Sazeides,et al.  Performance implications of single thread migration on a chip multi-core , 2005, CARN.

[22]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[23]  Suhaimi Ibrahim,et al.  An evaluation of current approaches for Web service composition , 2008, 2008 International Symposium on Information Technology.