Theius: A Streaming Visualization Suite for Hadoop Clusters*

As cloud computing clusters continue to grow, maintaining the health of these clusters becomes increasingly challenging. Recent work has studied how we can efficiently monitor the status of machines in these clusters and how we can detect problems or predict them before they occur, yet little work has focused on addressing the bottleneck between when these failures occur and when they are fixed: system administrators. As monitoring and failure detection systems mature, we are able to extract tremendous amounts of information about the status of the system in real time. However, this amount of data is difficult to understand for human beings, especially those inexperienced with the particular cluster. In this paper, we introduce a web-based visualization suite called Theius to allow system administrators to quickly understand the state of the cloud system as a whole. We outline the key features of this visualization tool, and show that it is more intuitive and easy to use than Ganglia, a state-of-the art visualization tool for clusters. Likewise, we demonstrate that our tool can scale, presenting a use case with our visualization showing a 5000 node cluster. Although our tool is implemented for Hadoop clusters, our contribution is general to any cloud computing system.