CloudTraceViz: A Visualization Tool for Tracing Dynamic Usage of Cloud Computing Resources

This paper introduces CloudTraceViz, a visual analytic tool for analyzing the characteristics of modern cloud data centers. The goals of this tool are: 1) to fulfill a set of visual tasks on cloud computing retrieved from in-depth interviews with domain experts, 2) to visualize and monitor large real-world data in terms of both the number of profiles and number of time steps, and 3) to aid system administrator to trace and understand the causal relationship of multivariate data. To reach these goals, our system composes several interconnected visual components. The customized heatmap is used to capture the pattern of a single machine as well as a group of machines, the progressive rendering of parallel coordinate allows users to see the dynamic behavior of running task/job over time, the scatterplot matrices are used in conjunction with the parallel graph for anomaly extraction. The results on the Alibaba Cloud Trace dataset show that the visualization tools offer great support for users to have a high-level overview of a large dataset as well as understand the causal relations within multivariate data.

[1]  F. James Rohlf,et al.  Generalization of the Gap Test for the Detection of Multivariate Outliers , 1975 .

[2]  Jacob Benesty,et al.  Pearson Correlation Coefficient , 2009 .

[3]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[4]  Leland Wilkinson,et al.  Stacking Graphic Elements to Avoid Over-Plotting , 2010, IEEE Transactions on Visualization and Computer Graphics.

[5]  Angus Graeme Forbes,et al.  PathwayMatrix: visualizing binary relationships between proteins in biological pathways , 2015, BMC Proceedings.

[6]  Leland Wilkinson,et al.  TimeExplorer: Similarity Search Time Series by Their Signatures , 2013, ISVC.

[7]  Danny Holten,et al.  Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[8]  Dongyi Ye,et al.  Minimum Spanning Tree Based Spatial Outlier Mining and Its Applications , 2008, RSKT.

[9]  Florina M. Ciorba,et al.  Anomaly Detection in High Performance Computers: A Vicinity Perspective , 2019, 2019 18th International Symposium on Parallel and Distributed Computing (ISPDC).

[10]  Robert L. Grossman,et al.  Graph-Theoretic Scagnostics , 2005, INFOVIS.

[11]  Cong Li,et al.  Zeno: A Straggler Diagnosis System for Distributed Computing Using Machine Learning , 2018, ISC.

[12]  Zhibin Yu,et al.  The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload: a View from Alibaba Trace , 2018, SoCC.

[13]  Vinh Thai Nguyen,et al.  CancerLinker: Explorations of Cancer Study Network , 2017, 2017 IEEE Visualization in Data Science (VDS).

[14]  Niklas Elmqvist,et al.  Graphical Perception of Multiple Time Series , 2010, IEEE Transactions on Visualization and Computer Graphics.

[15]  Kejiang Ye,et al.  Imbalance in the cloud: An analysis on Alibaba cluster trace , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[16]  Ben Shneiderman,et al.  A Rank-by-Feature Framework for Unsupervised Multidimensional Data Exploration Using Low Dimensional Projections , 2004, IEEE Symposium on Information Visualization.

[17]  Guangjie Han,et al.  Characteristics of Co-Allocated Online Services and Batch Jobs in Internet Data Centers: A Case Study From Alibaba Cloud , 2019, IEEE Access.

[18]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[19]  Tamara Munzner,et al.  Visualization Analysis and Design , 2014, A.K. Peters visualization series.

[20]  Vung Pham,et al.  UFO_Tracker: Visualizing UFO sightings , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[21]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[22]  Marcus A. Magnor,et al.  Selecting Coherent and Relevant Plots in Large Scatterplot Matrices , 2012, Comput. Graph. Forum.

[23]  Sukumar Nandi,et al.  An Outlier Detection Method Based on Clustering , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[24]  P. Prescott,et al.  On Rohlf's Method for the Detection of Outliers in Multivariate Data , 1995 .

[25]  Silvia Miksch,et al.  Characterizing Guidance in Visual Analytics , 2017, IEEE Transactions on Visualization and Computer Graphics.

[26]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[27]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[28]  M. Luo,et al.  The development of the CIE 2000 Colour Difference Formula , 2001 .

[29]  Bertjan Broeksema,et al.  Big Data Visual Analytics with Parallel Coordinates , 2015, 2015 Big Data Visual Analytics (BDVA).

[30]  Tommy Dang Visualizing Multidimensional Health Status of Data Centers , 2018, ESPT/VPA@SC.

[31]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[32]  Hans-Peter Seidel,et al.  An Edge-Bundling Layout for Interactive Parallel Coordinates , 2014, 2014 IEEE Pacific Visualization Symposium.

[33]  J. Simonoff,et al.  Procedures for the Identification of Multiple Outliers in Linear Models , 1993 .

[34]  A. Stetsenko Tool and Sign in the Development of the Child , 2004 .

[35]  Fred D. Davis Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology , 1989, MIS Q..