A data mining based approach to reliable distributed systems

The purpose of this paper is to open a novel research perspective on reliable distributed systems. The underlying hypothesis is that dynamic models of distributed systems can be established by the use of data mining techniques being applied to data gathered in observing the distributed systems. Feeding observations in an on-line monitoring process into such a model allows predicting upcoming reliability and performance problems, thus enabling the user or the system to take preventive measures for increased reliability or performance. We present the general approach and elaborate a concrete scenario of applying this approach in the field of distributed data mining algorithms. Keywords-data mining; distributed systems; reliability;

[1]  Rajeev Gandhi,et al.  Gumshoe: Diagnosing Performance Problems in Replicated File-Systems , 2008, 2008 Symposium on Reliable Distributed Systems.

[2]  Steve McKeever,et al.  Performance problem localization in self-healing, service-oriented systems using Bayesian networks , 2007, SAC '07.

[3]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[4]  Radu Prodan,et al.  Characterizing, Modeling and Predicting Dynamic Resource Availability in a Large Scale Multi-purpose Grid , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[5]  Miroslaw Malek,et al.  Call Availability Prediction in a Telecommunication System: A Data Driven Empirical Approach , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[6]  Kishor S. Trivedi,et al.  A Best Practice Guide to Resource Forecasting for Computing Systems , 2007, IEEE Transactions on Reliability.

[7]  Thomas Fahringer,et al.  Grid Application Fault Diagnosis Using Wrapper Services and Machine Learning , 2007, Int. J. Cooperative Inf. Syst..

[8]  David A. Cieslak,et al.  Short Paper: Troubleshooting Distributed Systems via Data Mining , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[9]  Ben Y. Zhao,et al.  Probabilistic Failure Detection for Efficient Distributed Storage Maintenance , 2008, 2008 Symposium on Reliable Distributed Systems.

[10]  Radu Prodan,et al.  Short Paper: Data Mining-based Fault Prediction and Detection on the Grid , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[11]  Karl-Erwin Großpietsch,et al.  Fault monitoring and correction in a walking robot using LMS filters , 2008, 2008 International Workshop on Intelligent Solutions in Embedded Systems.