Parallel Hybrid Metaheuristics with Distributed Intensification and Diversification for Large-scale Optimization in Big Data Statistical Analysis

Important insights into many data science problems that are traditionally analyzed via statistical models can be obtained by re-formulating and evaluating within a large-scale optimization framework. However, the theoretical underpinnings of the statistical model may shift the goal of the decision space traversal from a traditional search for a single optimal solution to a traversal with the purpose of yielding a set of high quality, independent solutions. We examine statistical frameworks with astronomical decision spaces that translate to optimization problem but are challenging for standard optimization methodologies. We address the new challenges by designing a hybrid metaheuristic with specialized intensification and diversification protocols in the base search algorithm. Our algorithm is extended to the high performance computing realm using the Stampede2 supercomputer where we experimentally demonstrate the effectiveness of our algorithm to utilize multiple processors to collaboratively hill climb, broadcast messages to one another regarding landscape characteristics, diversify across the solution landscape, and request aid in climbing particularly difficult peaks.

[1]  Ioannis B. Theocharis,et al.  Microgenetic algorithms as generalized hill-climbing operators for GA optimization , 2001, IEEE Trans. Evol. Comput..

[2]  Jeffrey Horn,et al.  Handbook of evolutionary computation , 1997 .

[3]  Christopher Winship,et al.  THE ESTIMATION OF CAUSAL EFFECTS FROM OBSERVATIONAL DATA , 1999 .

[4]  Jadranka Skorin-Kapov,et al.  Connection Machine implementation of a tabu search algorithm for the traveling salesman problem , 1993 .

[5]  Scott B. Baden,et al.  Analysis of the numerical effects of parallelism on a parallel genetic algorithm , 1996, Proceedings of International Conference on Parallel Processing.

[6]  Yanchun Liang,et al.  A multi-approaches-guided genetic algorithm with application to operon prediction , 2007, Artif. Intell. Medicine.

[7]  D. Rubin Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation , 2001, Health Services and Outcomes Research Methodology.

[8]  Alexander G. Nikolaev,et al.  An optimization approach for making causal inferences , 2013 .

[9]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[10]  Affirmative Action's Affirmative Actions: A Reply to Sander , 2005 .

[11]  Heinz Mühlenbein,et al.  Strategy Adaption by Competing Subpopulations , 1994, PPSN.

[12]  Shaowen Wang,et al.  A scalable parallel genetic algorithm for the Generalized Assignment Problem , 2015, Parallel Comput..

[13]  Oded Maimon,et al.  A genetic algorithm approach to scheduling PCBs on a single machine , 1998 .

[14]  Hitoshi Iba,et al.  Accelerating Differential Evolution Using an Adaptive Local Search , 2008, IEEE Transactions on Evolutionary Computation.

[15]  Sheldon H. Jacobson,et al.  Balance Optimization Subset Selection (BOSS): An Alternative Approach for Causal Inference with Observational Data , 2013, Oper. Res..

[16]  Bidyut Baran Chaudhuri,et al.  A novel hybrid genetic algorithm with Tabu search for optimizing multi-dimensional functions and point pattern recognition , 2013, Inf. Sci..

[17]  V. K. Koumousis,et al.  A saw-tooth genetic algorithm combining the effects of variable population size and reinitialization to enhance performance , 2006, IEEE Transactions on Evolutionary Computation.

[18]  Torsten Hoefler,et al.  Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[19]  Ajith Abraham,et al.  Hybrid Evolutionary Algorithms: Methodologies, Architectures, and Reviews , 2007 .

[20]  Kotaro Hirasawa,et al.  Adaptive random search with intensification and diversification combined with genetic algorithm , 2005, 2005 IEEE Congress on Evolutionary Computation.

[21]  D. Sun,et al.  A New Hybrid Genetic Algorithm and Tabu Search Method for Yard Cranes Scheduling with Inter-crane Interference , 2009 .

[22]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[23]  Surya B. Yadav,et al.  The Development and Evaluation of an Improved Genetic Algorithm Based on Migration and Artificial Selection , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[24]  Francisco Herrera,et al.  Real-Coded Memetic Algorithms with Crossover Hill-Climbing , 2004, Evolutionary Computation.

[25]  Francisco Herrera,et al.  Gradual distributed real-coded genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[26]  Marie-Ange Manier,et al.  A genetic algorithm with tabu search procedure for flexible job shop scheduling with transportation constraints and bounded processing times , 2012, Comput. Oper. Res..

[27]  Wade S. Smith,et al.  Effect of a US National Institutes of Health programme of clinical trials on public health and costs , 2006, The Lancet.

[28]  J. Sekhon,et al.  Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies , 2006, Review of Economics and Statistics.

[29]  J. P. Kelly,et al.  Tabu search for the multilevel generalized assignment problem , 1995 .

[30]  Michel Gendreau,et al.  Handbook of Metaheuristics , 2010 .

[31]  Reinhard Männer,et al.  Parallel Problem Solving from Nature — PPSN III , 1994, Lecture Notes in Computer Science.

[32]  Christian Blum,et al.  Metaheuristics in combinatorial optimization: Overview and conceptual comparison , 2003, CSUR.

[33]  Jiri Vohradsky,et al.  A parallel genetic algorithm for single class pattern classification and its application for gene expression profiling in Streptomyces coelicolor , 2007, BMC Genomics.

[34]  Wendy K. Tam Cho,et al.  An evolutionary algorithm for subset selection in causal inference models , 2018, J. Oper. Res. Soc..

[35]  P. Holland Statistics and Causal Inference , 1985 .

[36]  Mira Radeva Review of BDA Workshop 2014 , 2014, SIGA.

[37]  Michael Mascagni,et al.  SPRNG: A Scalable Library for Pseudorandom Number Generation , 1999, PP.

[38]  Dirk Schlierkamp Voosen Strategy Adaptation by Competing Subpopulations , 1994 .

[39]  Kosuke Imai,et al.  Do Get-Out-the-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments , 2005, American Political Science Review.

[40]  Nachol Chaiyaratana,et al.  Effects of diversity control in single-objective and multi-objective genetic algorithms , 2007, J. Heuristics.

[41]  Ghaith Rabadi,et al.  Performance of an Intensification Strategy Based on Learning in a Metaheuristic: Meta-RaPS with Path Relinking , 2016 .

[42]  D B Rubin,et al.  Criminality in XYY and XXY men. , 1976, Science.

[43]  Carlos García-Martínez,et al.  Hybrid metaheuristics with evolutionary algorithms specializing in intensification and diversification: Overview and progress report , 2010, Comput. Oper. Res..

[44]  On estimating the causal effects of DNR orders. , 1999, Medical care.

[45]  El-Ghazali Talbi,et al.  COSEARCH: a co-evolutionary metaheuristic , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[46]  D. Rubin,et al.  In utero exposure to phenobarbital and intelligence deficits in adult men. , 1995, JAMA.