Distributed Monitoring with Collaborative Prediction

Isolating users from the inevitable faults in large distributed systems is critical to Quality of Experience. We formulate the problem of probe selection for fault prediction based on end-to-end probing as a Collaborative Prediction (CP) problem. On an extensive experimental dataset from the EGI grid, the combination of the Maximum Margin Matrix Factorization approach to CP and Active Learning shows excellent performance, reducing the number of probes typically by 80% to 90%.

[1]  Gerald Tesauro,et al.  Estimating End-to-End Performance by Collaborative Prediction with Active Sampling , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[2]  Johan Efberg,et al.  YALMIP : A toolbox for modeling and optimization in MATLAB , 2004 .

[3]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  B. Borchers CSDP, A C library for semidefinite programming , 1999 .

[5]  Zibin Zheng,et al.  Collaborative reliability prediction of service-oriented systems , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[6]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[7]  Johan Löfberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004 .

[8]  Yi Lin,et al.  Statistical Properties and Adaptive Tuning of Support Vector Machines , 2002, Machine Learning.

[9]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[10]  A. D. Meglio,et al.  Programming the Grid with gLite , 2006 .

[11]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[12]  Abdelhamid Mellouk,et al.  A brief synthesis of QoS-QoE methodologies , 2011, 2011 10th International Symposium on Programming and Systems.

[13]  J.T. Moscicki DIANE - distributed analysis environment for GRID-enabled simulation and analysis of physics data , 2003, 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515).

[14]  P. Buncic,et al.  AliEn—ALICE environment on the GRID , 2003 .

[15]  Sheng Ma,et al.  Adaptive diagnosis in distributed systems , 2005, IEEE Transactions on Neural Networks.

[16]  T Maeno,et al.  PanDA: distributed production and distributed analysis system for ATLAS , 2008 .

[17]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[18]  J. Lindemann,et al.  Advanced Resource Connector middleware for lightweight computational Grids , 2007, Future Gener. Comput. Syst..

[19]  Abdelhamid Mellouk,et al.  QoE Model Driven for Network Services , 2010, WWIC.

[20]  Max Welling,et al.  Multi-HDP: A Non Parametric Bayesian Model for Tensor Factorization , 2008, AAAI.

[21]  E. Lanciotti,et al.  DIRAC3 – the new generation of the LHCb grid software , 2009 .

[22]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[23]  Ian T. Foster The globus toolkit for grid computing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.