LEARNING EFFECTIVE BDD VARIABLE ORDERS FOR BDD-BASED PROGRAM ANALYSIS

Software reliability and security are in jeopardy. As software has become ubiquitous and its capabilities have become more complex, code quality has been sacrificed in the race for the next ”killer app.” In response, program analysis researchers have mounted a revolution; they have developed new tools and methods, underpinned by traditional compilation techniques, in order to save software from its downward spiral. However, because these tools and analyses have also become more sophisticated, they too have suffered from scalability, reliability and complexity issues. Just as program analysis researchers have set out to solve the problems of software developers, we have set out to solve the problems of program analysis researchers. The bddbddb (Binary Decision Diagram-Based Deductive DataBase) system has recently made possible many advanced, context-sensitive program analyses. Such analyses can be expressed in bddbddb as Datalog queries, which are quantifiably easier to write than a traditional implementation. The bddbddb system’s unique compilation mechanisms also yield analyses of exceptional performance. The key to this performance is the use of Binary Decision Diagrams (BDDs), a compact representation that exploits repeated patterns in data, to represent the principle elements of an analysis. However, finding efficient BDD representations (i.e., BDD variable orders) for a particular analysis is nearly impossible to accomplish manually and, in some cases, is without a solution given an analysis and its formulation. This thesis presents an algorithm that helps automate the discovery of efficient BDD representations. Our technique reformulates the search for BDD variable orders as an active learning process over the space of BDD variable orders and their execution times. This technique revolves around an iterative process of carefully sampling new orders and then reducing the search space by extracting features from high performance orders. The dominant features of the sampled variable orders can then be used to generate new orders, refine existing orders, or even revise analysis formulations. The variable orders generated by our algorithm outperform those obtained after months of manual exploration. And, more importantly, our results make bddbddb a viable and valuable tool for program analysis researchers in their quest for better quality code.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[3]  Ondrej Lhoták,et al.  Jedd: a BDD-based relational extension of Java , 2004, PLDI '04.

[4]  Foster J. Provost,et al.  Active Learning for Class Probability Estimation and Ranking , 2001, IJCAI.

[5]  Benjamin Livshits,et al.  Reflection Analysis for Java , 2005, APLAS.

[6]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[7]  John Whaley Joeq: A virtual machine and compiler infrastructure , 2005, Sci. Comput. Program..

[8]  Fabio Somenzi,et al.  Who are the variables in your neighborhood , 1995, ICCAD.

[9]  Jørn Lind-Nielsen,et al.  BuDDy : A binary decision diagram package. , 1999 .

[10]  Beate Bollig,et al.  Improving the Variable Ordering of OBDDs Is NP-Complete , 1996, IEEE Trans. Computers.

[11]  Monica S. Lam,et al.  Using Datalog with Binary Decision Diagrams for Program Analysis , 2005, APLAS.

[12]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[13]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[14]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[15]  Ondrej Lhoták,et al.  Points-to analysis using BDDs , 2003, PLDI '03.

[16]  Craig A. Knoblock,et al.  Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[17]  Rolf Drechsler,et al.  Learning Heuristics for OBDD Minimization by Evolutionary Algorithms , 1996, PPSN.

[18]  F. Somenzi,et al.  Who are the variables in your neighbourhood , 1995, Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[19]  Z. Ruttkay Fuzzy constraint satisfaction , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[20]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[21]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[22]  I. Wegener,et al.  SIMULATED ANNEALING TO IMPROVE VARIABLE ORDERINGS FOR OBDDsBeate , 1995 .

[23]  Masahiro Fujita,et al.  Variable ordering algorithms for ordered binary decision diagrams and their evaluation , 1993, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[25]  D. Avots,et al.  Improving software security with a C pointer analysis , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[26]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[27]  Don E. Ross,et al.  Functional approaches to generating orderings for efficient symbolic representations , 1992, [1992] Proceedings 29th ACM/IEEE Design Automation Conference.

[28]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[29]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[30]  Don E. Ross,et al.  Heuristics to compute variable orderings for efficient manipulation of ordered binary decision diagrams , 1991, 28th ACM/IEEE Design Automation Conference.

[31]  Zhi-Hua Zhou,et al.  On the Size of Training Set and the Benefit from Ensemble , 2004, PAKDD.

[32]  N. J. A. Sloane,et al.  The On-Line Encyclopedia of Integer Sequences , 2003, Electron. J. Comb..

[33]  Benjamin Livshits,et al.  Context-sensitive program analysis as database queries , 2005, PODS.

[34]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[35]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[36]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[37]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[38]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[39]  Thomas G. Dietterich,et al.  Improved Class Probability Estimates from Decision Tree Models , 2003 .

[40]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[41]  Richard Rudell Dynamic variable ordering for ordered binary decision diagrams , 1993, ICCAD.

[42]  Gregory Tassey,et al.  Prepared for what , 2007 .

[43]  Masahiro Fujita,et al.  Efficient variable ordering using aBDD based sampling , 2000, DAC.

[44]  Janak H. Patel,et al.  Efficient variable ordering heuristics for shared ROBDD , 1993, 1993 IEEE International Symposium on Circuits and Systems.

[45]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[46]  Xiaoyu Song,et al.  BDD variable ordering by scatter search , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[47]  Peter A. Flach,et al.  Improving the AUC of Probabilistic Estimation Trees , 2003, ECML.

[48]  Shaul Markovitch,et al.  Learning to Order BDD Variables in Verification , 2011, J. Artif. Intell. Res..

[49]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[50]  John P. Gallagher,et al.  Techniques for Scaling Up Analyses Based on Pre-interpretations , 2005, ICLP.

[51]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[52]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[53]  Hans W. Guesgen,et al.  Heuristics for solving fuzzy constraint satisfaction problems , 1995, Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems.

[54]  Gerard Salton,et al.  Improving Retrieval Performance by Relevance Feedback , 1997 .