Microservice Fingerprinting and Classification using Machine Learning

Application aware data centers promise various benefits for data center management, in terms of resource provisioning, power estimation, network management, security protection, etc. However, the emerging microservices make it challenging for data center operators to accurately identify what applications are deployed by tenants, due to their highly dynamic and heterogeneous nature. In this paper, we address the problem of fingerprinting microservices in a unified, efficient, accurate and non-intrusive fashion. To this end, we characterize the runtime behaviors of microservices using eBPF-based lightweight system call tracing. To accurately fingerprint a diverse set of microservices based on their system call activities, we utilize the machine learning approach which combines Bayesian learning and LSTM autoencoders. We demonstrate that our approach can fingerprint many real-world microservices with 99% accuracy, using 1–2% additional CPU resource, and can detect the presence of previously unseen microservices with near perfect accuracy.

[1]  Marco Canini,et al.  Efficient application identification and the temporal and spatial stability of classification schema , 2009, Comput. Networks.

[2]  James Won-Ki Hong,et al.  Towards automated application signature generation for traffic identification , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[3]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[4]  Stephanie Forrest,et al.  The Evolution of System-Call Monitoring , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[5]  Claudia Eckert,et al.  Deep Learning for Classification of Malware System Call Sequences , 2016, Australasian Conference on Artificial Intelligence.

[6]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[7]  Akshat Verma,et al.  WattApp: an application aware power meter for shared data centers , 2010, ICAC '10.

[8]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[9]  Yan Gao,et al.  Predicting the intrusion intentions by observing system call sequences , 2004, Comput. Secur..

[10]  Qi Shi,et al.  A Deep Learning Approach to Network Intrusion Detection , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[11]  Stefano Zanero,et al.  Detecting Intrusions through System Call Sequence and Argument Analysis , 2010, IEEE Transactions on Dependable and Secure Computing.

[12]  Claus Pahl,et al.  Microservices: The Journey So Far and Challenges Ahead , 2018, IEEE Softw..

[13]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[14]  R. Sekar,et al.  A fast automaton-based method for detecting anomalous program behaviors , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[15]  Phuoc Tran-Gia,et al.  SDN-Based Application-Aware Networking on the Example of YouTube Video Streaming , 2013, 2013 Second European Workshop on Software Defined Networks.

[16]  Chen Shen,et al.  Spatio-Temporal AutoEncoder for Video Anomaly Detection , 2017, ACM Multimedia.

[17]  Lin Chen,et al.  EffiEye: Application-Aware Large Flow Detection in Data Center , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[18]  Jie Wu,et al.  Robust Network Traffic Classification , 2015, IEEE/ACM Transactions on Networking.

[19]  Niels Provos,et al.  Improving Host Security with System Call Policies , 2003, USENIX Security Symposium.

[20]  Brian D. Noble,et al.  Workload-Aware Provisioning in Public Clouds , 2014, IEEE Internet Computing.

[21]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.