Artificial intelligence in the service of system administrators

The LHCb online system relies on a large and heterogeneous IT infrastructure made from thousands of servers on which many different applications are running. They run a great variety of tasks: critical ones such as data taking and secondary ones like web servers. The administration of such a system and making sure it is working properly represents a very important workload for the small expert-operator team. Research has been performed to try to automatize (some) system administration tasks, starting in 2001 when IBM defined the so-called "self objectives" supposed to lead to "autonomic computing". In this context, we present a framework that makes use of artificial intelligence and machine learning to monitor and diagnose at a low level and in a non intrusive way Linux-based systems and their interaction with software. Moreover, the multi agent approach we use, coupled with an "object oriented paradigm" architecture should increase our learning speed a lot and highlight relations between problems.

[1]  Niko Neufeld,et al.  The LHCb online system , 2003 .

[2]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[3]  Yixin Diao,et al.  ABLE: A toolkit for building multiagent autonomic systems , 2002, IBM Syst. J..

[4]  Seyed Masoud Sadjadi,et al.  Transparent shaping of existing software to support pervasive and autonomic computing , 2005, ACM SIGSOFT Softw. Eng. Notes.

[5]  David Garlan,et al.  Proceedings of the 2006 international workshop on Self-adaptation and self-managing systems , 2006, ICSE 2006.

[6]  Gail E. Kaiser,et al.  Kinesthetics eXtreme: an external infrastructure for monitoring distributed legacy systems , 2003, 2003 Autonomic Computing Workshop.

[7]  Matthew L. Ginsberg,et al.  Essentials of Artificial Intelligence , 2012 .

[8]  Gail E. Kaiser,et al.  Self-managing systems: a control theory foundation , 2005, 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS'05).

[9]  Wei Peng,et al.  An Integrated Data-Driven Framework for Computing System Management , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[10]  Julie A. McCann,et al.  A survey of autonomic computing—degrees, models, and applications , 2008, CSUR.