Application classification through monitoring and learning of resource consumption patterns

Application awareness is an important factor of efficient resource scheduling. This paper introduces a novel approach for application classification based on the principal component analysis (PCA) and the k-nearest neighbor (k-NN) classifier. This approach is used to assist scheduling in heterogeneous computing environments. It helps to reduce the dimensionality of the performance feature space and classify applications based on extracted features. The classification considers four dimensions: CPU-intensive, I/O and paging-intensive, network-intensive, and idle. Application class information and the statistical abstracts of the application behavior are learned over historical runs and used to assist multi-dimensional resource scheduling. This paper describes a prototype classifier for application-centric virtual machines. Experimental results show that scheduling decisions made with the assistance of the application class information, improved system throughput by 22.11% on average, for a set of three benchmark applications.

[1]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[2]  Robert P. Goldberg,et al.  Survey of virtual machine research , 1974, Computer.

[3]  Renato J. O. Figueiredo,et al.  VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[4]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.

[5]  R. P. Goldberg,et al.  Virtual Machine Technology: A Bridge From Large Mainframes To Networks Of Small Computers , 1979 .

[6]  Susan C. Lee,et al.  Training a neural-network based intrusion detector to recognize novel attacks , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[7]  V. Rao Vemuri,et al.  Using Text Categorization Techniques for Intrusion Detection , 2002, USENIX Security Symposium.

[8]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[9]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[10]  Lingyun Yang,et al.  Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[11]  David G. Stork,et al.  Pattern Classification , 1973 .

[12]  Rajesh Raman,et al.  Policy driven heterogeneous resource co-allocation with Gangmatching , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[13]  Erland Jonsson,et al.  Using active learning in intrusion detection , 2004, Proceedings. 17th IEEE Computer Security Foundations Workshop, 2004..

[14]  Carla E. Brodley,et al.  Predictive application-performance modeling in a computational grid environment , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[15]  Xiaomin Zhu,et al.  From virtualized resources to virtual computing grids: the In-VIGO system , 2005, Future Gener. Comput. Syst..

[16]  Miron Livny,et al.  Improving Goodput by Coscheduling CPU and Network Capacity , 1999, Int. J. High Perform. Comput. Appl..

[17]  Philip C. Roth,et al.  Real-Time Statistical Clustering for Event Trace Reduction , 1997, Int. J. High Perform. Comput. Appl..

[18]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[19]  Miron Livny,et al.  Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .

[20]  Rudolf Eigenmann,et al.  Benchmarking with real industrial applications: the SPEC High-Performance Group , 1996 .

[21]  Jeffrey S. Vetter,et al.  Scalable Analysis Techniques for Microprocessor Performance Counter Metrics , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[22]  Xingfu Wu,et al.  Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications , 2003, PERV.

[23]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[24]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[25]  Henri Casanova,et al.  RUMR: robust scheduling for divisible workloads , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[26]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[27]  Michael Schatz,et al.  Learning Program Behavior Profiles for Intrusion Detection , 1999, Workshop on Intrusion Detection and Network Monitoring.

[28]  Jaspal Subhlok,et al.  Skeleton based performance prediction on shared networks , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..