Towards automated performance diagnosis in a large IPTV network

IPTV is increasingly being deployed and offered as a commercial service to residential broadband customers. Compared with traditional ISP networks, an IPTV distribution network (i) typically adopts a hierarchical instead of mesh-like structure, (ii) imposes more stringent requirements on both reliability and performance, (iii) has different distribution protocols (which make heavy use of IP multicast) and traffic patterns, and (iv) faces more serious scalability challenges in managing millions of network elements. These unique characteristics impose tremendous challenges in the effective management of IPTV network and service. In this paper, we focus on characterizing and troubleshooting performance issues in one of the largest IPTV networks in North America. We collect a large amount of measurement data from a wide range of sources, including device usage and error logs, user activity logs, video quality alarms, and customer trouble tickets. We develop a novel diagnosis tool called Giza that is specifically tailored to the enormous scale and hierarchical structure of the IPTV network. Giza applies multi-resolution data analysis to quickly detect and localize regions in the IPTV distribution hierarchy that are experiencing serious performance problems. Giza then uses several statistical data mining techniques to troubleshoot the identified problems and diagnose their root causes. Validation against operational experiences demonstrates the effectiveness of Giza in detecting important performance issues and identifying interesting dependencies. The methodology and algorithms in Giza promise to be of great use in IPTV network operations.

[1]  Hai Jin,et al.  Towards cinematic internet video-on-demand , 2008, Eurosys '08.

[2]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[3]  Srikanth Kandula,et al.  Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.

[4]  Yan Liu,et al.  Temporal causal modeling with graphical granger methods , 2007, KDD '07.

[5]  Kamakshi Sridhar,et al.  End-to-end diagnostics in IPTV architectures , 2008, Bell Labs Technical Journal.

[6]  Renata Teixeira,et al.  NetDiagnoser: troubleshooting network unreachabilities using end-to-end probes and routing data , 2007, CoNEXT '07.

[7]  Dimitrios Gunopulos,et al.  Efficient and effective explanation of change in hierarchical summaries , 2007, KDD '07.

[8]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[9]  Geoffrey M. Voelker,et al.  NetPrints: Diagnosing Home Network Misconfigurations Using Shared Knowledge , 2009, NSDI.

[10]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[11]  Olivier Fourmaux,et al.  P2P IPTV measurement: a case study of TVants , 2006, CoNEXT '06.

[12]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[13]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[14]  Xu Chen,et al.  Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions , 2008, OSDI.

[15]  Yin Zhang,et al.  Troubleshooting chronic conditions in large IP networks , 2008, CoNEXT '08.

[16]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[17]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[18]  Paul R. Cohen,et al.  Two Algorithms for Inducing Structural Equation Models from Data , 1994, AISTATS.

[19]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[20]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[21]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[22]  Keith W. Ross,et al.  A Measurement Study of a Large-Scale P2P IPTV System , 2007, IEEE Transactions on Multimedia.

[23]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[24]  AmmarMostafa,et al.  Answering what-if deployment and configuration questions with wise , 2008 .

[25]  Paramvir Bahl,et al.  Towards highly reliable enterprise network services via inference of multi-level dependencies , 2007, SIGCOMM '07.

[26]  Cheng Huang,et al.  Challenges, design and analysis of a large-scale p2p-vod system , 2008, SIGCOMM '08.

[27]  Paul R. Cohen,et al.  Regression Can Build Predictive Causal Models , 1994 .

[28]  Ben Y. Zhao,et al.  Understanding user behavior in large-scale video-on-demand systems , 2006, EuroSys.

[29]  Ranveer Chandra,et al.  What's going on?: learning communication rules in edge networks , 2008, SIGCOMM '08.

[30]  Pablo Rodriguez,et al.  Watching television over an IP network , 2008, IMC '08.

[31]  Seungjoon Lee,et al.  Modeling channel popularity dynamics in a large IPTV system , 2009, SIGMETRICS '09.

[32]  Divesh Srivastava,et al.  Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[33]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[34]  Albert G. Greenberg,et al.  Detection and Localization of Network Black Holes , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.