BLR-D: applying bilinear logistic regression to factored diagnosis problems

In this paper, we address a pattern of diagnosis problems in which each of J entities produces the same K features, yet we are only informed of overall faults from the ensemble. Furthermore, we suspect that only certain entities and certain features are leading to the problem. The task, then, is to reliably identify which entities and which features are at fault. Such problems are particularly prevalent in the world of computer systems, in which a datacenter with hundreds of machines, each with the same performance counters, occasionally produces overall faults. In this paper, we present a means of using a constrained form of bilinear logistic regression for diagnosis in such problems. The bilinear treatment allows us to represent the scenarios with J+K instead of JK parameters, resulting in more easily interpretable results and far fewer false positives compared to treating the parameters independently. We develop statistical tests to determine which features and entities, if any, may be responsible for the labeled faults, and use false discovery rate (FDR) analysis to ensure that our values are meaningful. We show results in comparison to ordinary logistic regression (with L1 regularization) on two scenarios: a synthetic dataset based on a model of faults in a datacenter, and a real problem of finding problematic processes/features based on user-reported hangs.

[1]  Michael I. Jordan,et al.  Variational Probabilistic Inference and the QMR-DT Network , 2011, J. Artif. Intell. Res..

[2]  Shan Lu,et al.  Flight data recorder: monitoring persistent-state interactions to improve systems management , 2006, OSDI '06.

[3]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[4]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[5]  Kensei Nagata,et al.  Development of a clinical diagnosis support tool to identify patients with lumbar spinal stenosis , 2007, European Spine Journal.

[6]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[7]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[8]  Helen J. Wang,et al.  Automatic Misconfiguration Troubleshooting with PeerPressure , 2004, OSDI.

[9]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[10]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Dawson R. Engler,et al.  From uncertainty to belief: inferring the specification within , 2006, OSDI '06.

[12]  Nir Friedman,et al.  On the application of the bootstrap for computing confidence measures on features of induced Bayesian networks , 1999, AISTATS.

[13]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[14]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[15]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[16]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[17]  Xiao Ma,et al.  MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs , 2007, SOSP.

[18]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[19]  Lucas C. Parra,et al.  Bilinear Discriminant Component Analysis , 2007, J. Mach. Learn. Res..

[20]  D. Whitefield,et al.  A review of: “Practical Nonpararnetric Statistics. By W. J. CONOVER. (New York: Wiley, 1971.) [Pl" x+462.] £5·25. , 1972 .

[21]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[22]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[23]  David Heckerman,et al.  Determining the Number of Non-Spurious Arcs in a Learned DAG Model: Investigation of a Bayesian and a Frequentist Approach , 2007, UAI.

[24]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[25]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[26]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .