Algorithm selection for sorting and probabilistic inference: a machine learning-based approach

The algorithm selection problem aims at selecting the best algorithm for a given computational problem instance according to some characteristics of the instance. In this dissertation, we first introduce some results from theoretical investigation of the algorithm selection problem. We show, by Rice's theorem, the nonexistence of an automatic algorithm selection program based only on the description of the input instance and the competing hardness and algorithm performance based on Kolmogorov complexity to show that algorithm selection for search is also incomputable. Driven by the theoretical results, we propose a machine learning-based inductive approach using experimental algorithmic methods and machine learning techniques to solve the algorithm selection problem. Experimentally, we have applied the proposed methodology to algorithm selection for sorting and the MPE problem. In sorting, instances with an existing order are easier for some algorithms. We have studied different presortedness measures, designed algorithms to generate permutations with a specified existing order uniformly at random, and applied various learning algorithms to induce sorting algorithm selection models from runtime experimental results. In the MPE problem, the instance characteristics we have studied include size and topological type of the network, network connectedness, skewness of the distributions in Conditional Probability Tables (CPTs), and the proportion and distribution of evidence variables. The MPE algorithms considered include an exact algorithm (clique-tree propagation), two stochastic sampling algorithms (MCMC Gibbs sampling and importance forward sampling), two search-based algorithms (multi-restart hill-climbing and tabu search), and one hybrid algorithm combining both sampling and search (ant colony optimization). Another major contribution of this dissertation is the discovery of multifractal properties of the joint probability distributions of Bayesian networks. With sufficient asymmetry in individual prior and conditional probability distributions, the joint distribution is not only highly skewed, but it also has clusters of high-probability instantiations at all scales. We present a two phase hybrid random sampling and search algorithm to solve the MPE problem exploiting this clustering property. Since the MPE problem (decision version) is NP-complete, the multifractal meta-heuristic can be applied to solve other NP-hard combinatorial optimization problems as well.

[1]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[2]  H. Rice Classes of recursively enumerable sets and their decision problems , 1953 .

[3]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[4]  C. A. R. Hoare,et al.  Algorithm 64: Quicksort , 1961, Commun. ACM.

[5]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[6]  Algorithm 235: Random permutation , 1964, CACM.

[7]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[8]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[9]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[10]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[11]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[12]  B. Mandelbrot Possible refinement of the lognormal hypothesis concerning the distribution of energy dissipation in intermittent turbulence , 1972 .

[13]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  Michael L. Fredman,et al.  On computing the length of longest increasing subsequences , 1975, Discret. Math..

[16]  John R. Rice,et al.  The Algorithm Selection Problem , 1976, Adv. Comput..

[17]  Gregory J. Chaitin,et al.  Algorithmic Information Theory , 1987, IBM J. Res. Dev..

[18]  Albert Nijenhuis,et al.  Combinatorial Algorithms for Computers and Calculators , 1978 .

[19]  M. Garey Johnson: computers and intractability: a guide to the theory of np- completeness (freeman , 1979 .

[20]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[21]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[22]  Judea Pearl,et al.  A Computational Model for Causal and Diagnostic Reasoning in Inference Systems , 1983, IJCAI.

[23]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  C. Sparrow The Fractal Geometry of Nature , 1984 .

[25]  Peter C. Cheeseman,et al.  In Defense of Probability , 1985, IJCAI.

[26]  Max Henrion,et al.  Propagating uncertainty in bayesian networks by probabilistic logic sampling , 1986, UAI.

[27]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[28]  Catherine C. McGeoch Experimental analysis of algorithms , 1986 .

[29]  D. White,et al.  Constructive combinatorics , 1986 .

[30]  Judea Pearl,et al.  Evidential Reasoning Using Stochastic Simulation of Causal Models , 1987, Artif. Intell..

[31]  David E. Goldberg,et al.  Finite Markov Chain Analysis of Genetic Algorithms , 1987, ICGA.

[32]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[33]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[34]  Stuart German,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1988 .

[35]  Yaser S. Abu-Mostafa,et al.  Random problems , 1988, J. Complex..

[36]  Benoit B. Mandelbrot,et al.  Multifractal measures, especially for the geophysicist , 1989 .

[37]  Ross D. Shachter,et al.  Simulation Approaches to General Probabilistic Inference on Belief Networks , 2013, UAI.

[38]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[39]  Kuo-Chu Chang,et al.  Weighing and Integrating Evidence for Stochastic Simulation in Bayesian Networks , 2013, UAI.

[40]  B. Mandelbrot Multifractal measures, especially for the geophysicist , 1989 .

[41]  Ming Li,et al.  Kolmogorov Complexity and its Applications , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[42]  Yuval Davidor,et al.  Epistasis Variance: A Viewpoint on GA-Hardness , 1990, FOGA.

[43]  Kurt Mehlhorn,et al.  Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity , 1990 .

[44]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[45]  David E. Goldberg,et al.  The Nonuniform Walsh-Schema Transform , 1990, FOGA.

[46]  Eric Horvitz,et al.  Ideal reformulation of belief networks , 1990, UAI.

[47]  L. Darrell Whitley,et al.  Fundamental Principles of Deception in Genetic Search , 1990, FOGA.

[48]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[49]  David S. Johnson,et al.  A Catalog of Complexity Classes , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[50]  Solomon Eyal Shimony,et al.  A new algorithm for finding MAP assignments to belief networks , 1990, UAI.

[51]  Peter C. Cheeseman,et al.  Where the Really Hard Problems Are , 1991, IJCAI.

[52]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[53]  S. Havlin,et al.  Fractals and Disordered Systems , 1991 .

[54]  Gregory J. E. Rawlins Compared to What , 1991 .

[55]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[56]  Eugene Santos,et al.  On the Generation of Alternative Explanations with Implications for Belief Revision , 1991, UAI.

[57]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[58]  Eric Joel Hovitz Computation and action under bounded resources , 1991 .

[59]  Melanie Mitchell,et al.  The royal road for genetic algorithms: Fitness landscapes and GA performance , 1991 .

[60]  Catherine C. McGeoch Analyzing algorithms by simulation: variance reduction techniques and simulation speedups , 1992, CSUR.

[61]  Marco Dorigo,et al.  Optimization, Learning and Natural Algorithms , 1992 .

[62]  L. Darrell Whitley,et al.  An Executable Model of a Simple Genetic Algorithm , 1992, FOGA.

[63]  Kalyanmoy Deb,et al.  Massive Multimodality, Deception, and Genetic Algorithms , 1992, PPSN.

[64]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[65]  U. Fayyad On the induction of decision trees for multiple concept learning , 1991 .

[66]  Michael D. Vose,et al.  Modeling Simple Genetic Algorithms , 1992, FOGA.

[67]  Melanie Mitchell,et al.  Relative Building-Block Fitness and the Building Block Hypothesis , 1992, FOGA.

[68]  Derick Wood,et al.  A survey of adaptive sorting algorithms , 1992, CSUR.

[69]  John J. Grefenstette,et al.  Deception Considered Harmful , 1992, FOGA.

[70]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[71]  Alistair Sinclair,et al.  Algorithms for Random Generation and Counting: A Markov Chain Approach , 1993, Progress in Theoretical Computer Science.

[72]  Carla E. Brodley,et al.  Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection , 1993 .

[73]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[74]  John N. Hooker,et al.  Needed: An Empirical Science of Algorithms , 1994, Oper. Res..

[75]  David E. Goldberg,et al.  Genetic Algorithm Difficulty and the Modality of Fitness Landscapes , 1994, FOGA.

[76]  Kenneth A. De Jong,et al.  Using Markov Chains to Analyze GAFOs , 1994, FOGA.

[77]  Solomon Eyal Shimony,et al.  Finding MAPs for Belief Networks is NP-Hard , 1994, Artif. Intell..

[78]  Osamu Watanabe,et al.  Instance complexity , 1994, JACM.

[79]  Marek J. Druzdzel,et al.  Some Properties of joint Probability Distributions , 1994, UAI.

[80]  Robert M. Fung,et al.  Backward Simulation in Bayesian Networks , 1994, UAI.

[81]  Terry Jones,et al.  Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms , 1995, ICGA.

[82]  Eric Horvitz,et al.  Reasoning, Metareasoning, and Mathematical Truth: Studies of Theorem Proving under Limited Resources , 1995, UAI.

[83]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[84]  Carla E. Brodley Recursive automatic algorithm selection for inductive learning , 1995 .

[85]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[86]  Toby Walsh,et al.  How Not To Do It , 1995 .

[87]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[88]  WILLIAM H. HSU,et al.  Automatic synthesis of compression techniques for heterogeneous files , 1995, Softw. Pract. Exp..

[89]  Shlomo Zilberstein,et al.  Operational Rationality through Compilation of Anytime Algorithms , 1995, AI Mag..

[90]  Solomon Eyal Shimony,et al.  On a Distributed Anytime Architecture for Probabilistic Reasoning. , 1995 .

[91]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[92]  Joseph C. Culberson,et al.  On Searching \alpha-ary Hypercubes and Related Graphs , 1996, FOGA.

[93]  Brett J. Borghetti,et al.  Inference Algorithm Performance and Selection under Constrained Resources , 1996 .

[94]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[95]  Ann E. Nicholson,et al.  Belief network algorithms: A study of performance based on domain characterization , 1996, PRICAI Workshops.

[96]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[97]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[98]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[99]  Rudolf H. Riedi,et al.  Multifractal Properties of TCP Traffic: a Numerical Study , 1997 .

[100]  Rudolf H. Riedi,et al.  An introduction to multifractals , 1997 .

[101]  M Dorigo,et al.  Ant colonies for the travelling salesman problem. , 1997, Bio Systems.

[102]  Alain Hertz,et al.  Ants can colour graphs , 1997 .

[103]  Bart Selman,et al.  Algorithm Portfolio Design: Theory vs. Practice , 1997, UAI.

[104]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[105]  Edward M. Williams,et al.  Modeling Intelligent Control of Distributed Cooperative Inferencing , 1997 .

[106]  Corso Elvezia,et al.  Ant colonies for the traveling salesman problem , 1997 .

[107]  R. Gregory Taylor,et al.  Models of Computation and Formal Languages , 1997 .

[108]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[109]  Joseph C. Culberson,et al.  On the Futility of Blind Search: An Algorithmic View of No Free Lunch , 1998, Evolutionary Computation.

[110]  David J. C. Mackay,et al.  Introduction to Monte Carlo Methods , 1998, Learning in Graphical Models.

[111]  Ashraf M. Abdelbar,et al.  Approximating MAPs for Belief Networks is NP-Hard and Other Theorems , 1998, Artif. Intell..

[112]  Michael T. Goodrich,et al.  Education forum: Web Enhanced Textbooks , 1998, SIGA.

[113]  Eugene Fink,et al.  How to Solve It Automatically: Selection Among Problem Solving Methods , 1998, AIPS.

[114]  R. Dechter,et al.  Stochastic Local Search for Bayesian Networks , 1999 .

[115]  David S. Johnson,et al.  A theoretician's guide to the experimental analysis of algorithms , 1999, Data Structures, Near Neighbor Searches, and Methodology.

[116]  Luca Maria Gambardella,et al.  Ant Algorithms for Discrete Optimization , 1999, Artificial Life.

[117]  Rina Dechter,et al.  Stochastic local search for Bayesian network , 1999, AISTATS.

[118]  Bernard M. E. Moret,et al.  DIMACS Series in Discrete Mathematics and Theoretical Computer Science Towards a Discipline of Experimental Algorithmics , 2022 .

[119]  Colin R. Reeves,et al.  Genetic Algorithms and the Design of Experiments , 1999 .

[120]  David C. Wilkins,et al.  Efficient Bayesian Network Inference: Genetic Algorithms, Stochastic Local Search, and Abstraction , 1999 .

[121]  L. Darrell Whitley,et al.  Search, Binary Representations and Counting Optima , 1999 .

[122]  Jian Cheng,et al.  AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks , 2000, J. Artif. Intell. Res..

[123]  Bart Naudts,et al.  A comparison of predictive measures of problem difficulty in evolutionary algorithms , 2000, IEEE Trans. Evol. Comput..

[124]  Michail G. Lagoudakis,et al.  Algorithm Selection using Reinforcement Learning , 2000, ICML.

[125]  Naren Ramakrishnan,et al.  PYTHIA-II: a knowledge/database system for managing performance data and recommending scientific software , 2000, TOMS.

[126]  Melantjong Random Generation Of Dags For Graph Drawing , 2000 .

[127]  Reha Uzsoy,et al.  Experimental Evaluation of Heuristic Optimization Algorithms: A Tutorial , 2001, J. Heuristics.

[128]  Steven Homer,et al.  Computability and Complexity Theory , 2001, Texts in Computer Science.

[129]  Michail G. Lagoudakis,et al.  Selecting the Right Algorithm , 2001 .

[130]  W. Freeman,et al.  Bethe free energy, Kikuchi approximations, and belief propagation algorithms , 2001 .

[131]  James A. Foster,et al.  Computational complexity and the genetic algorithm , 2001 .

[132]  M. Kesseböhmer Large deviation for weak Gibbs measures and multifractal spectra , 2001 .

[133]  D. Harte Multifractals: Theory and Applications , 2001 .

[134]  C. Reeves,et al.  Properties of fitness functions and search landscapes , 2001 .

[135]  David Maxwell Chickering,et al.  A Bayesian Approach to Tackling Hard Computational Problems (Preliminary Report) , 2001, Electron. Notes Discret. Math..

[136]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[137]  Markus P. J. Fromherz,et al.  A Framework for On-line Adaptive Control of Problem Solving , 2001 .

[138]  William H. Hsu,et al.  A Survey of Algorithms for Real-Time Bayesian Network Inference , 2002 .

[139]  James D. Park,et al.  MAP Complexity Results and Approximation Methods , 2002, UAI.

[140]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[141]  Fábio Gagliardi Cozman,et al.  Random Generation of Bayesian Networks , 2002, SBIA.

[142]  Payam Pakzad,et al.  Belief Propagation and Statistical Physics , 2002 .

[143]  Eric Horvitz,et al.  Dynamic restart policies , 2002, AAAI/IAAI.

[144]  David E. Goldberg,et al.  The Design of Innovation: Lessons from and for Competent Genetic Algorithms , 2002 .

[145]  James D. Park Using weighted MAX-SAT engines to solve MPE , 2002, AAAI/IAAI.

[146]  William H. Hsu,et al.  Control of inductive bias in supervised learning using evolutionary computation: a wrapper-based approach , 2003 .

[147]  Steven Minton,et al.  Automatically configuring constraint satisfaction programs: A case study , 1996, Constraints.

[148]  Melanie Mitchell,et al.  What Makes a Problem Hard for a Genetic Algorithm? Some Anomalous Results and Their Explanation , 2004, Machine Learning.

[149]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[150]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[151]  Juraj Hromkovic,et al.  Algorithmics for Hard Problems , 2004, Texts in Theoretical Computer Science. An EATCS Series.

[152]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[153]  Michael D. Vose,et al.  Modeling genetic algorithms with Markov chains , 1992, Annals of Mathematics and Artificial Intelligence.