A behavior cluster based availability prediction approach for nodes in distribution networks

To predict the availability state of a node in a distribution network, its history trace is usually used. Sometimes, some usage behavior patterns cannot be captured precisely from the insufficient trace, which may lead to unreliable predictors. In this paper, to alleviate the data sparseness problem, the nodes with the similar behaviors are clustered, and all history information in a same cluster is seen as another information source for any node in it. For each node, an N-gram model is used to train the predictor by the combination of the new source and the node's own trace. In addition, because it is hard to capture the trace of all nodes in large scale networks, such as P2P networks, a bagging based prediction algorithm is proposed, which can be applied in the distribution environment and relieve the effect of the noisy data. In our experiments, three datasets are evaluated. Results show that the prediction performance of our cluster based N-gram predictor is better than the results of several other predictors. And the bagging based prediction algorithm presents its validity in the distribution environment.

[1]  Ravi Jain,et al.  An Experimental Study of the Skype Peer-to-Peer VoIP System , 2005, IPTPS.

[2]  Rudolf Eigenmann,et al.  Prediction of Resource Availability in Fine-Grained Cycle Sharing Systems Empirical Evaluation , 2007, Journal of Grid Computing.

[3]  Cheng-Zhong Xu,et al.  Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[4]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[5]  Brian D. Noble,et al.  Exploiting Availability Prediction in Distributed Systems , 2006, NSDI.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Yuan Xue,et al.  On Feasibility of P2P On-Demand Streaming via Empirical VoD User Behavior Analysis , 2008, 2008 The 28th International Conference on Distributed Computing Systems Workshops.

[8]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[9]  Stefan Savage,et al.  Understanding Availability , 2003, IPTPS.

[10]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[11]  R. Eigenmann,et al.  Resource Failure Prediction in Fine-Grained Cycle Sharing Systems , 2005 .

[12]  Brian D. Noble,et al.  Predicting node availability in peer-to-peer networks , 2005, SIGMETRICS '05.

[13]  Haym Hirsh,et al.  Learning to Predict Rare Events in Categorical Time-Series Data , 1998 .

[14]  Michael J. Lewis,et al.  Multi-state grid resource availability characterization , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[15]  Brian D. Noble,et al.  Improving distributed system performance using machine availability prediction , 2006, PERV.

[16]  Richard Wolski,et al.  Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.

[17]  Xabiel G. Pañeda,et al.  User behavior analysis of a video-on-demand service with a wide variety of subjects and lengths , 2005 .

[18]  Rudolf Eigenmann,et al.  Empirical Studies on the Behavior of Resource Availability in Fine-Grained Cycle Sharing Systems , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[19]  Michael J. Lewis,et al.  Resource Availability Prediction for Improved Grid Scheduling , 2008, 2008 IEEE Fourth International Conference on eScience.

[20]  Anand Sivasubramaniam,et al.  Critical event prediction for proactive management in large-scale computer clusters , 2003, KDD '03.

[21]  David P. Anderson,et al.  Ensuring Collective Availability in Volatile Resource Pools Via Forecasting , 2008, DSOM.

[22]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[23]  H. Liu,et al.  Conference on Measurement and modeling of computer systems , 2001 .

[24]  Marvin Theimer,et al.  Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.