Using visualizations to monitor changes and harvest insights from a global-scale logging infrastructure at Twitter

Logging user activities is essential to data analysis for internet products and services. Twitter has built a unified logging infrastructure that captures user activities across all clients it owns, making it one of the largest datasets in the organization. This paper describes challenges and opportunities in applying information visualization to log analysis at this massive scale, and shows how various visualization techniques can be adapted to help data scientists extract insights. In particular, we focus on two scenarios: (1) monitoring and exploring a large collection of log events, and (2) performing visual funnel analysis on log data with tens of thousands of event types. Two interactive visualizations were developed for these purposes: we discuss design choices and the implementation of these systems, along with case studies of how they are being used in day-to-day operations at Twitter.

[1]  Tetsuji Takada,et al.  MieLog: A Highly Interactive Visual Log Browser Using Information Visualization and Statistical Analysis , 2002, LISA.

[2]  Gerald M. Karam,et al.  Visualization using timelines , 1994, ISSTA '94.

[3]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[4]  Ben Shneiderman,et al.  Querying event sequences by exact match or similarity search: Design and empirical evaluation , 2012, Interact. Comput..

[5]  Ben Shneiderman,et al.  LifeFlow: visualizing an overview of event sequences , 2011, CHI.

[6]  Ben Shneiderman,et al.  Using interactive visualizations of WWW log data to characterize access patterns and inform site design , 2001, J. Assoc. Inf. Sci. Technol..

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  J. Stasko,et al.  Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[9]  Benoît Otjacques,et al.  VAFLE: visual analytics of firewall log events , 2013, Electronic Imaging.

[10]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[11]  James A. Landay,et al.  WebQuilt: a framework for capturing and visualizing the web experience , 2001, WWW '01.

[12]  Beverly L. Harrison,et al.  Timelines: An Interactive System for the Collection and Visualization of Temporal Data , 1994 .

[13]  Ben Shneiderman,et al.  A Visual Interface for Multivariate Temporal Data: Finding Patterns of Events across Multiple Histories , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[14]  Edward M. Reingold,et al.  Tidier Drawings of Trees , 1981, IEEE Transactions on Software Engineering.

[15]  Kwan-Liu Ma,et al.  Visual cluster exploration of web clickstream data , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[16]  Mark Guzdial,et al.  Visualizing usability log data , 1996, Proceedings IEEE Symposium on Information Visualization '96.

[17]  Ben Shneiderman,et al.  LifeLines: using visualization to enhance navigation and analysis of patient records , 1998, AMIA.

[18]  Krishna Bharat,et al.  WEBVIZ: A Tool for World Wide Web Access Log Analysis , 1994 .

[19]  HeerJeffrey,et al.  D3 Data-Driven Documents , 2011 .

[20]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[21]  Serdar Tasiran,et al.  TreeJuxtaposer: scalable tree comparison using Focus+Context with guaranteed visibility , 2003, ACM Trans. Graph..

[22]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[23]  Ed Huai-hsin Chi Improving Web Usability Through Visualization , 2002, IEEE Internet Comput..

[24]  Daniel A. Keim,et al.  Challenges in Visual Data Analysis , 2006, Tenth International Conference on Information Visualisation (IV'06).

[25]  Tamara Munzner,et al.  Session Viewer: Visual Exploratory Analysis of Web Session Logs , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[26]  Tamara Munzner,et al.  H3: laying out large directed graphs in 3D hyperbolic space , 1997, Proceedings of VIZ '97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium.

[27]  Padhraic Smyth,et al.  Visualization of navigation patterns on a Web site using model-based clustering , 2000, KDD '00.

[28]  Michael Burch,et al.  Timeline trees: visualizing sequences of transactions in information hierarchies , 2008, AVI '08.

[29]  Andreas Paepcke,et al.  Progressive multiples for communication-minded visualization , 2007, GI '07.

[30]  Martin Wattenberg,et al.  Visualizing the stock market , 1999, CHI Extended Abstracts.

[31]  Charles Wetherell,et al.  Tidy Drawings of Trees , 1979, IEEE Transactions on Software Engineering.

[32]  J. B. Kruskal,et al.  Icicle Plots: Better Displays for Hierarchical Clustering , 1983 .

[33]  Chuang Liu,et al.  The Unified Logging Infrastructure for Data Analytics at Twitter , 2012, Proc. VLDB Endow..

[34]  Silvia Miksch,et al.  Connecting time-oriented data and information to a coherent interactive visualization , 2004, CHI.

[35]  Jean-Daniel Fekete,et al.  The InfoVis Toolkit , 2004, IEEE Symposium on Information Visualization.

[36]  Ramana Rao,et al.  The Hyperbolic Browser: A Focus + Context Technique for Visualizing Large Hierarchies , 1996, J. Vis. Lang. Comput..

[37]  Jock D. Mackinlay,et al.  Information visualization using 3D interactive animation , 1991, CHI.

[38]  Matthew D. Cooper,et al.  ActiviTree: Interactive Visual Exploration of Sequences in Event-Based Data Using Graph Similarity , 2009, IEEE Transactions on Visualization and Computer Graphics.

[39]  Ben Shneiderman,et al.  Temporal Event Sequence Simplification , 2013, IEEE Transactions on Visualization and Computer Graphics.

[40]  Ben Shneiderman,et al.  Visualizing Change over Time Using Dynamic Hierarchies: TreeVersity2 and the StemView , 2013, IEEE Transactions on Visualization and Computer Graphics.

[41]  Kwan-Liu Ma,et al.  Visual analysis of massive web session data , 2012, IEEE Symposium on Large Data Analysis and Visualization (LDAV).

[42]  Dominique Brodbeck,et al.  Research directions in data wrangling: Visualizations and transformations for usable and credible data , 2011, Inf. Vis..

[43]  Ying Li,et al.  Funnel report mining for the MSN network , 2001, KDD '01.

[44]  David Gotz,et al.  Exploring Flow, Factors, and Outcomes of Temporal Event Sequences with the Outflow Visualization , 2012, IEEE Transactions on Visualization and Computer Graphics.

[45]  Ben Shneiderman,et al.  Aligning temporal data by sentinel events: discovering patterns in electronic health records , 2008, CHI.

[46]  Han-Wei Shen,et al.  Visualizing Changes of Hierarchical Data using Treemaps , 2007, IEEE Transactions on Visualization and Computer Graphics.