Learning Factorizations in Estimation of Distribution Algorithms Using Affinity Propagation

Estimation of distribution algorithms (EDAs) that use marginal product model factorizations have been widely applied to a broad range of mainly binary optimization problems. In this paper, we introduce the affinity propagation EDA (AffEDA) which learns a marginal product model by clustering a matrix of mutual information learned from the data using a very efficient message-passing algorithm known as affinity propagation. The introduced algorithm is tested on a set of binary and nonbinary decomposable functions and using a hard combinatorial class of problem known as the HP protein model. The results show that the algorithm is a very efficient alternative to other EDAs that use marginal product model factorizations such as the extended compact genetic algorithm (ECGA) and improves the quality of the results achieved by ECGA when the cardinality of the variables is increased.

[1]  Martin Pelikan,et al.  Scalable Optimization via Probabilistic Modeling: From Algorithms to Applications (Studies in Computational Intelligence) , 2006 .

[2]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[3]  Ernesto Costa,et al.  Probabilistic Evolution and the Busy Beaver Problem , 2000, GECCO.

[4]  William E. Hart,et al.  Protein structure prediction with evolutionary algorithms , 1999 .

[5]  Kumara Sastry,et al.  Extended Compact Genetic Algorithm in Matlab , 2007 .

[6]  J. A. Lozano,et al.  Analyzing the PBIL Algorithm by Means of Discrete Dynamical Systems , 2000 .

[7]  Jose Miguel Puerta,et al.  EDNA: Estimation of Dependency Networks Algorithm , 2007, IWINAC.

[8]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[9]  Martin Pelikan,et al.  Hierarchical Bayesian optimization algorithm: toward a new generation of evolutionary algorithms , 2010, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[10]  David E. Goldberg,et al.  Genetic Algorithm Design Inspired by Organizational Theory: Pilot Study of a Dependency Structure Matrix Driven Genetic Algorithm , 2003, GECCO.

[11]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[12]  Heinz Mühlenbein,et al.  A Maximum Entropy Approach to Sampling in EDA ? The Single Connected Case , 2003, CIARP.

[13]  William E. Hart,et al.  Fast protein folding in the hydrophobic-hydrophilic model within three-eights of optimal , 1995, STOC '95.

[14]  Roberto Santana,et al.  Estimation of Distribution Algorithms with Kikuchi Approximations , 2005, Evolutionary Computation.

[15]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[16]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[17]  W. Deming,et al.  On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .

[18]  G. Harik Learning gene linkage to efficiently solve problems of bounded difficulty using genetic algorithms , 1997 .

[19]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[20]  Vasant Honavar,et al.  Evolutionary Synthesis of Bayesian Networks for Optimization , 2001 .

[21]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[22]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[23]  Siddhartha Shakya,et al.  Using a Markov network model in a univariate EDA: an empirical cost-benefit analysis , 2005, GECCO '05.

[24]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[25]  D. Goldberg,et al.  A matrix approach for finding extrema: problems with modularity, hierarchy, and overlap , 2006 .

[26]  David E. Goldberg,et al.  Enhancing the Efficiency of the ECGA , 2008, PPSN.

[27]  Pedro Larrañaga,et al.  Protein Folding in Simplified Models With Estimation of Distribution Algorithms , 2008, IEEE Transactions on Evolutionary Computation.

[28]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[29]  Hans Kellerer,et al.  Thek-partitioning problem , 1998, Math. Methods Oper. Res..

[30]  Pedro Larrañaga,et al.  Mixtures of Kikuchi Approximations , 2006, ECML.

[31]  Qingfu Zhang,et al.  On stability of fixed points of limit models of univariate marginal distribution algorithm and factorized distribution algorithm , 2004, IEEE Transactions on Evolutionary Computation.

[32]  Dirk Thierens,et al.  Crossing the road to efficient IDEAs for permutation problems , 2001 .

[33]  Michele Leone,et al.  Clustering by Soft-constraint Affinity Propagation: Applications to Gene-expression Data , 2022 .

[34]  D. E. Goldberg,et al.  Simple Genetic Algorithms and the Minimal, Deceptive Problem , 1987 .

[35]  Yair Weiss,et al.  Approximate Inference and Protein-Folding , 2002, NIPS.

[36]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[37]  Robin Hons,et al.  Estimation of Distribution Algorithms and Minimum Relative Entropy , 2005 .

[38]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[39]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[40]  William E. Hart,et al.  Fast Protein Folding in the Hydrophobic-Hydrophillic Model within Three-Eights of Optimal , 1996, J. Comput. Biol..

[41]  Pedro Larrañaga,et al.  Towards a New Evolutionary Computation - Advances in the Estimation of Distribution Algorithms , 2006, Towards a New Evolutionary Computation.

[42]  Fernando G. Lobo,et al.  Extended Compact Genetic Algorithm in C , 1999 .

[43]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[44]  G. Harik Linkage Learning via Probabilistic Modeling in the ECGA , 1999 .

[45]  Garrison W. Greenwood,et al.  On the Evolutionary Search for Solutions to the Protein Folding Problem , 2003 .

[46]  M. Mézard,et al.  Analytic and Algorithmic Solution of Random Satisfiability Problems , 2002, Science.

[47]  David H. Ackley,et al.  An empirical study of bit vector function optimization , 1987 .

[48]  Roberto Santana A Markov Network Based Factorized Distribution Algorithm for Optimization , 2003, ECML.

[49]  P. Grassberger,et al.  Growth algorithms for lattice heteropolymers at low temperatures , 2002, cond-mat/0208042.

[50]  Pedro Larrañaga,et al.  Side chain placement using estimation of distribution algorithms , 2007, Artif. Intell. Medicine.

[51]  Heinz Mühlenbein,et al.  The Factorized Distribution Algorithm and the Minimum Relative Entropy Principle , 2006, Scalable Optimization via Probabilistic Modeling.

[52]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  A. Hartmann Phase Transitions in Combinatorial Optimization Problems - Basics, Algorithms and Statistical Mechanics , 2005 .

[54]  Vincenzo Cutello,et al.  An Immune Algorithm for Protein Structure Prediction on Lattice Models , 2007, IEEE Transactions on Evolutionary Computation.

[55]  David E. Goldberg,et al.  The compact genetic algorithm , 1999, IEEE Trans. Evol. Comput..

[56]  Riccardo Zecchina,et al.  Survey propagation: An algorithm for satisfiability , 2002, Random Struct. Algorithms.

[57]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[58]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[59]  Marc Mézard,et al.  1993 , 1993, The Winning Cars of the Indianapolis 500.

[60]  David E. Goldberg,et al.  Sporadic model building for efficiency enhancement of hierarchical BOA , 2006, GECCO.

[61]  Kalyanmoy Deb,et al.  Analyzing Deception in Trap Functions , 1992, FOGA.

[62]  Concha Bielza,et al.  MATEDA: A suite of EDA programs in Matlab , 2009 .

[63]  Pedro Larrañaga,et al.  Exact Bayesian network learning in estimation of distribution algorithms , 2007, 2007 IEEE Congress on Evolutionary Computation.

[64]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[65]  Pedro Larrañaga,et al.  Optimization by Max-Propagation Using Kikuchi Approximations , 2007 .

[66]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[67]  Jose Miguel Puerta,et al.  Improved EDNA (estimation of dependency networks algorithm) using combining function with bivariate probability distributions , 2008, GECCO '08.

[68]  Brendan J. Frey,et al.  Mixture Modeling by Affinity Propagation , 2005, NIPS.