Wrappers for performance enhancement and oblivious decision graphs

In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious read-once decision graphs (OODGs). For accuracy estimation, we investigate cross-validation and the~.632 bootstrap. We show examples where they fail and conduct a large scale study comparing them. We conclude that repeated runs of five-fold cross-validation give a good tradeoff between bias and variance for the problem of model selection used in later chapters. We define the wrapper approach and use it for feature subset selection and parameter tuning. We relate definitions of feature relevancy to the set of optimal features, which is defined with respect to both a concept and an induction algorithm. The wrapper approach requires a search space, operators, a search engine, and an evaluation function. We investigate all of them in detail and introduce compound operators for feature subset selection. Finally, we abstract the search problem into search with probabilistic estimates. We introduce decision tables with a default majority rule (DTMs) to test the conjecture that feature subset selection is a very powerful bias. The accuracy of induced DTMs is surprisingly powerful, and we concluded that this bias is extremely important for many real-world datasets. We show that the resulting decision tables are very small and can be succinctly displayed. We study properties of oblivious read-once decision graphs (OODGs) and show that they do not suffer from some inherent limitations of decision trees. We describe a a general framework for constructing OODGs bottom-up and specialize it using the wrapper approach. We show that the graphs produced are use less features than C4.5, the state-of-the-art decision tree induction algorithm, and are usually easier for humans to comprehend.

[1]  George Boole,et al.  An Investigation of the Laws of Thought: Frontmatter , 2009 .

[2]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3]  E. T. An Introduction to the Theory of Numbers , 1946, Nature.

[4]  Claude E. Shannon,et al.  The synthesis of two-terminal switching circuits , 1949, Bell Syst. Tech. J..

[5]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[6]  C. Y. Lee Representation of switching circuits by binary-decision programs , 1959 .

[7]  I. Niven,et al.  An introduction to the theory of numbers , 1961 .

[8]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[9]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[10]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[11]  Solomon L. Pollack,et al.  Conversion of limited-entry decision tables to computer programs , 1965, CACM.

[12]  Irving John Good,et al.  The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[13]  Lewis T. Reinwald,et al.  Conversion of Limited-Entry Decision Tables to Optimal Computer Programs I: Minimum Average Processing Time , 1966, JACM.

[14]  Lewis T. Reinwald,et al.  Conversion of Limited-Entry Decision Tables to Optimal Computer Programs II: minimum storage requirement , 1967, JACM.

[15]  F. J. Anscombe,et al.  Topics in the Investigation of Linear Relations Fitted by the Method of Least Squares , 1967 .

[16]  P. Lachenbruch An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. , 1967, Biometrics.

[17]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[18]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[19]  D. F. Andrews,et al.  Robust Estimates of Location , 1972 .

[20]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[21]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[22]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[23]  William Joseph Masek,et al.  A fast algorithm for the string editing problem and decision graph complexity , 1976 .

[24]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[25]  Kenneth C. Sevcik,et al.  The synthetic approach to decision table conversion , 1976, CACM.

[26]  G. McLachlan Bias of Apparent Error Rate in Discriminant-Analysis , 1976 .

[27]  Jan M. Van Campenhout,et al.  On the Possible Orderings in the Measurement Selection Problem , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  M. Stone Asymptotics for and against cross-validation , 1977 .

[29]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[30]  Ned Glick,et al.  Additive estimators for probabilities of correct classification , 1978, Pattern Recognit..

[31]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[32]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[33]  Hans J. Berliner,et al.  The B* Tree Search Algorithm: A Best-First Proof Procedure , 1979, Artif. Intell..

[34]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[35]  Allen Newell,et al.  The Knowledge Level , 1989, Artif. Intell..

[36]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[37]  T. Niblett,et al.  AUTOMATIC INDUCTION OF CLASSIFICATION RULES FOR A CHESS ENDGAME , 1982 .

[38]  Laveen N. Kanal,et al.  Classification, Pattern Recognition and Reduction of Dimensionality , 1982, Handbook of Statistics.

[39]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[40]  Bernard M. E. Moret,et al.  Decision Trees and Diagrams , 1982, CSUR.

[41]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[42]  Herbert A. Simon,et al.  WHY SHOULD MACHINES LEARN , 1983 .

[43]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[44]  Alan J. Miller Sélection of subsets of regression variables , 1984 .

[45]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985, Proceedings of the IEEE.

[46]  David E. Smith,et al.  Ordering Conjunctive Queries , 1985, Artif. Intell..

[47]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[48]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[49]  David A. Mix Barrington,et al.  Bounded-width polynomial-size branching programs recognize exactly those languages in NC1 , 1986, STOC '86.

[50]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[51]  Gail Gong Cross-Validation, the Jackknife, and the Bootstrap: Excess Error Estimation in Forward Logistic Regression , 1986 .

[52]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[53]  Douglas B. Lenat,et al.  On the thresholds of knowledge , 1987, Proceedings of the International Workshop on Artificial Intelligence for Industrial Applications.

[54]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[55]  Ronald J. Brachman The myth of the one true logic , 1987 .

[56]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[57]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Kenneth J. Supowit,et al.  Finding the Optimal Variable Ordering for Binary Decision Diagrams , 1987, 24th ACM/IEEE Design Automation Conference.

[59]  Zdzislaw Pawlak Decision tables - a rough set approach , 1987, Bull. EATCS.

[60]  J. Ross Quinlan,et al.  An Empirical Comparison of Genetic and Decision-Tree Classifiers , 1988, ML.

[61]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[62]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[63]  Reid G. Smith,et al.  Fundamentals of expert systems , 1988 .

[64]  Christoph Meinel,et al.  Separating the Eraser Turing Machine Classes Le, NLe, co-NLe and Pe , 1988, International Symposium on Mathematical Foundations of Computer Science.

[65]  S. K. Michael Wong,et al.  Rough Sets: Probabilistic versus Deterministic Approach , 1988, Int. J. Man Mach. Stud..

[66]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[67]  J. Stephen Judd,et al.  On the complexity of loading shallow neural networks , 1988, J. Complex..

[68]  Edward A. Feigenbaum,et al.  The rise of the expert company , 1988 .

[69]  R. Gray,et al.  Applications of information theory to pattern recognition and the design of decision trees and trellises , 1988 .

[70]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[71]  Stuart L. Crawford Extensions to the CART Algorithm , 1989, Int. J. Man Mach. Stud..

[72]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[73]  J. Hertz,et al.  Phase transitions in simple learning , 1989 .

[74]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[75]  Mark S. Boddy,et al.  Solving Time-Dependent Planning Problems , 1989, IJCAI.

[76]  Larry A. Rendell,et al.  Learning hard concepts through constructive induction: framework and rationale , 1990, Comput. Intell..

[77]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[78]  S. Sheather,et al.  Robust Estimation and Testing , 1990 .

[79]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[81]  Kurt Mehlhorn,et al.  LEDA: A Library of Efficient Data Types and Algorithms , 1990, ICALP.

[82]  Nils J. Nilsson,et al.  The Mathematical Foundations of Learning Machines , 1990 .

[83]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[84]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[85]  R. L. de Mantaras A Distance-Based Attribute Selection Measure for Decision Tree Induction , 1991 .

[86]  Brian R. Gaines,et al.  The Trade-Off between Knowledge and Data in Knowledge Acquisition , 1991, Knowledge Discovery in Databases.

[87]  Hiroshi Sawada,et al.  Minimization of binary decision diagrams based on exchanges of variables , 1991, 1991 IEEE International Conference on Computer-Aided Design Digest of Technical Papers.

[88]  Wojciech Ziarko,et al.  The Discovery, Analysis, and Representation of Data Dependencies in Databases , 1991, Knowledge Discovery in Databases.

[89]  JoBea Way,et al.  The evolution of synthetic aperture radar systems and their progression to the EOS SAR , 1991, IEEE Trans. Geosci. Remote. Sens..

[90]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[91]  San Francisco,et al.  28th ACM/IEEE DESIGN AUTOMATION CONFERENCE@ , 1991 .

[92]  B Efron,et al.  Statistical Data Analysis in the Computer Age , 1991, Science.

[93]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[94]  Matthias Krause,et al.  On Oblivious Branching Programs of Linear Length , 1991, Inf. Comput..

[95]  Masahiro Fujita,et al.  On variable ordering of binary decision diagrams for the application of multi-level logic synthesis , 1991, Proceedings of the European Conference on Design Automation..

[96]  David H. Wolpert,et al.  On the Connection between In-sample Testing and Generalization Error , 1992, Complex Syst..

[97]  Usama M. Fayyad,et al.  The Attribute Selection Problem in Decision Tree Generation , 1992, AAAI.

[98]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[99]  A. Atkinson Subset Selection in Regression , 1992 .

[100]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[101]  E. Mammen When Does Bootstrap Work?: Asymptotic Results and Simulations , 1992 .

[102]  Randal E. Bryant,et al.  Symbolic Boolean manipulation with ordered binary-decision diagrams , 1992, CSUR.

[103]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[104]  Daniel N. Hill,et al.  An Empirical Investigation of Brute Force to choose Features, Smoothers and Function Approximators , 1992 .

[105]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[106]  R. Greiner Probabilistic Hill-climbing: Theory and Applications , 1992 .

[107]  U. Fayyad On the induction of decision trees for multiple concept learning , 1991 .

[108]  Dana Angluin,et al.  Computational learning theory: survey and selected bibliography , 1992, STOC '92.

[109]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[110]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[111]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[112]  V. Dvorak,et al.  An optimization technique for ordered (binary) decision diagrams , 1992, CompEuro 1992 Proceedings Computer Systems and Software Engineering.

[113]  D. Yan,et al.  Stochastic discrete optimization , 1992 .

[114]  Jianping Zhang,et al.  Selecting Typical Instances in Instance-Based Learning , 1992, ML.

[115]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[116]  Shin-ichi Minato,et al.  Minimum-Width Method of Variable Ordering for Binary Decision Diagrams , 1992 .

[117]  Christoph Meinel,et al.  Branching Programs - An Efficient Data Structure for Computer-Aided Circuit Design , 1992, Bull. EATCS.

[118]  Ping Zhang On the Distributional Properties of Model Selection Criteria , 1992 .

[119]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[120]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[121]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[122]  John Shawe-Taylor,et al.  Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[123]  Zdzisław Pawlak,et al.  Rough sets. Present state and the future , 1993 .

[124]  Igor Kononenko,et al.  Inductive and Bayesian learning in medical diagnosis , 1993, Appl. Artif. Intell..

[125]  Jancik,et al.  Multisurface Method of Pattern Separation , 1993 .

[126]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[127]  Usama M. Fayyad,et al.  SKICAT: A Machine Learning System for Automated Cataloging of Large Scale Sky Surveys , 1993, ICML.

[128]  L. Guillen,et al.  Investigation of Hypothesis-Driven Constructive Induction in AQ17-HCI , 1993 .

[129]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[130]  Sreejit Chakravarty,et al.  A Characterization of Binary Decision Diagrams , 1993, IEEE Trans. Computers.

[131]  Yasuhiko Takenaga,et al.  NP-completeness of Minimum Binary Decision Diagram Identification Problems , 1993 .

[132]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[133]  Stan Matwin,et al.  Using Qualitative Models to Guide Inductive Learning , 1993, ICML.

[134]  Peter D. Turney Exploiting Context When Learning to Classify , 1993, ECML.

[135]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[136]  Jonathan J. Oliver Decision Graphs - An Extension of Decision Trees , 1993 .

[137]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[138]  Charles Elkan,et al.  Estimating the Accuracy of Learned Concepts , 1993, IJCAI.

[139]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[140]  Lawrence D. Jackel,et al.  Learning Curves: Asymptotic Values and Rate of Convergence , 1993, NIPS.

[141]  Maciej Modrzejewski,et al.  Feature Selection Using Rough Sets Theory , 1993, ECML.

[142]  Alberto Maria Segre The Ninth International Conference on Machine Learning , 1993, AI Mag..

[143]  A. Weigend Introduction to the theory of neural computation: John A. Hertz, Anders S. Krogh and Richard G. Palmer☆ , 1993 .

[144]  Carsten Lund,et al.  On the hardness of approximating minimization problems , 1993, STOC.

[145]  Ryszard S. Michalski,et al.  Learning Problem-Oriented Decision Structures from Decision Rule: The AQDT-2 System , 1994, ISMIS.

[146]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[147]  Sholom M. Weiss,et al.  Decision Tree Pruning: Biased or Optimal? , 1994, AAAI.

[148]  Bjarne Stroustrup,et al.  The Design and Evolution of C , 1994 .

[149]  Carsten Lund,et al.  On the hardness of approximating minimization problems , 1994, JACM.

[150]  J. R. Quinlan,et al.  Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[151]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[152]  George H. John Cross-Validated C4.5: Using Error Estimation for Automatic Parameter Selection , 1994 .

[153]  O'Kane,et al.  Learning to classify in large committee machines. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[154]  Ron Kohavi,et al.  Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitations , 1994, AAAI.

[155]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[156]  D. Haussler,et al.  Rigorous learning curve bounds from statistical mechanics , 1994, COLT '94.

[157]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[158]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[159]  Ron Kohavi,et al.  Useful Feature Subsets and Rough Set Reducts , 1994 .

[160]  Paul E. Utgoff,et al.  An Improved Algorithm for Incremental Induction of Decision Trees , 1994, ICML.

[161]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[162]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[163]  Ron Kohavi Feature Subset Selection as Search with Probabilistic Estimates , 1994 .

[164]  Pat Langley,et al.  Oblivious Decision Trees and Abstract Cases , 1994 .

[165]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[166]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[167]  Thomas G. Dietterich,et al.  A study of distance-based machine learning algorithms , 1994 .

[168]  Gregory M. Provan,et al.  A Comparison of Induction Algorithms for Selective and non-Selective Bayesian Classifiers , 1995, ICML.

[169]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[170]  R. Tibshirani,et al.  Cross-Validation and the Bootstrap : Estimating the Error Rate ofa Prediction , 1995 .

[171]  Alberto L. Sangiovanni-Vincentelli,et al.  Inferring Reduced Ordered Decision Graphs of Minimum Description Length , 1995, ICML.

[172]  Ron Kohavi,et al.  Automatic Parameter Selection by Minimizing Estimated Error , 1995, ICML.

[173]  Sebastian Thrun,et al.  Learning One More Thing , 1994, IJCAI.

[174]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[175]  David H. Wolpert,et al.  The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework , 1995 .

[176]  Ronitt Rubinfeld,et al.  On learning bounded-width branching programs , 1995, COLT '95.

[177]  Gregory M. Provan,et al.  Learning Bayesian Networks Using Feature Selection , 1995, AISTATS.

[178]  Zijian Zheng,et al.  Constructing Nominal X-of-N Attributes , 1995, IJCAI.

[179]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[180]  William Nick Street,et al.  An Inductive Learning Approach to Prognostic Prediction , 1995, ICML.

[181]  Carl M. Kadie,et al.  SEER: maximum likelihood regression for learning-speed curves , 1995 .

[182]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[183]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[184]  Ricard Gavaldà,et al.  Learning Ordered Binary Decision Diagrams , 1995, ALT.

[185]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[186]  Nils J. Nilsson,et al.  MLC++, A Machine Learning Library in C++. , 1995 .

[187]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[188]  Ron Kohavi,et al.  Oblivious Decision Trees, Graphs, and Top-Down Pruning , 1995, IJCAI.

[189]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[190]  Donato Malerba,et al.  Simplifying Decision Trees by Pruning and Grafting: New Results (Extended Abstract) , 1995, ECML.

[191]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[192]  Philip C. Spector Introduction to S and S-Plus , 1995 .

[193]  Beate Bollig,et al.  Improving the Variable Ordering of OBDDs Is NP-Complete , 1996, IEEE Trans. Computers.

[194]  David H. Wolpert,et al.  On Bias Plus Variance , 1997, Neural Computation.