A Machine Learning Approach to Performance Prediction of Total Order Broadcast Protocols

Total Order Broadcast (TOB) is a fundamental building block at the core of a number of strongly consistent, fault-tolerant replication schemes. While it is widely known that the performance of existing TOB algorithms varies greatly depending on the workload and deployment scenarios, the problem of how to forecast their performance in realistic settings is, at current date, still largely unexplored. In this paper we address this problem by exploring the possibility of leveraging on machine learning techniques for building, in a fully decentralized fashion, performance models of TOB protocols. Based on an extensive experimental study considering heterogeneous workloads and multiple TOB protocols, we assess the accuracy and efficiency of alternative machine learning methods including neural networks, support vector machines, and decision tree-based regression models. We propose two heuristics for the feature selection phase, that allow to reduce its execution time up to two orders of magnitude incurring in a very limited loss of prediction accuracy.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Jan Vitek,et al.  STMBench7: a benchmark for software transactional memory , 2007, EuroSys '07.

[4]  Fred B. Schneider,et al.  Replication management using the state-machine approach , 1993 .

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Maurice Herlihy,et al.  A flexible framework for implementing software transactional memory , 2006, OOPSLA '06.

[7]  Armando Fox,et al.  Ensembles of models for automated diagnosis of system performance problems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[8]  Tianxi Cai,et al.  Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers , 2008, J. Mach. Learn. Res..

[9]  Luís E. T. Rodrigues,et al.  Appia, a flexible protocol kernel supporting multiple coordinated channels , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[10]  Jing Xu,et al.  Autonomic resource management in virtualized data centers using fuzzy logic-based approaches , 2008, Cluster Computing.

[11]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[12]  Ian T. Foster,et al.  Statistical data reduction for efficient application performance monitoring , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[13]  Thomas G. Dietterich Overfitting and undercomputing in machine learning , 1995, CSUR.

[14]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[17]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[18]  Flaviu Cristian,et al.  A performance comparison of asynchronous atomic broadcast protocols , 1994, Distributed Syst. Eng..

[19]  Paolo Romano,et al.  Towards distributed software transactional memory systems , 2008, LADIS '08.

[20]  Péter Urbán,et al.  Performance analysis of a consensus algorithm combining stochastic activity networks and measurements , 2002, Proceedings International Conference on Dependable Systems and Networks.

[21]  Luís E. T. Rodrigues,et al.  D2STM: Dependable Distributed Software Transactional Memory , 2009, 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing.

[22]  Paolo Romano,et al.  D 2 STM : Dependable Distributed Software Transactional Memory ∗ , 2009 .

[23]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[24]  Luís Moura Silva,et al.  Using machine learning for non-intrusive modeling and prediction of software aging , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[25]  Ian H. Witten,et al.  Weka-A Machine Learning Workbench for Data Mining , 2005, Data Mining and Knowledge Discovery Handbook.

[26]  Paul Barford,et al.  A Machine Learning Approach to TCP Throughput Prediction , 2007, IEEE/ACM Transactions on Networking.

[27]  André Schiper,et al.  Modeling and Validating the Performance of Atomic Broadcast Algorithms in High Latency Networks , 2007, Euro-Par.

[28]  Yves Chauvin,et al.  Backpropagation: the basic theory , 1995 .

[29]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[30]  Rachid Guerraoui,et al.  The Database State Machine Approach , 2003, Distributed and Parallel Databases.

[31]  J. Vetter,et al.  Managing Performance Analysis with Dynamic Statistical Projection Pursuit , 2000, ACM/IEEE SC 1999 Conference (SC'99).

[32]  Luis Garcés-Erice Admission Control for Distributed Complex Responsive Systems , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.