论文信息 - Towards a Scalable Distributed Fitness Evaluation Service

Towards a Scalable Distributed Fitness Evaluation Service

Organizations across the globe gather more and more data. Large datasets require new approaches to analysis and processing, which include methods based on machine learning. In particular, the symbolic regression can provide many useful insights. Unfortunately, due to high resource requirements, the use of this method for large datasets might be unfeasible. In this paper we analyze a bottleneck in an open-source implementation of this method, we call hubert. We identify that the evaluation of individuals is the most costly operation. As a solution to this problem, we propose a new evaluation service based on the Apache Spark framework, which attempts to speed up computations by distributing them on a cluster of machines. We compare the performance of the service by analyzing the execution time for a number of samples with use of both implementations. Then we discuss how the computation time improves with increased amount of resources. Finally we draw conclusions and outline plans for further research.

Wlodzimierz Funika | Pawel Koperek | W. Funika | Paweł Koperek

[1] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[2] Hugh Glaser,et al. Parallel Implementation of a Genetic-Programming Based Tool for Symbolic Regression , 1998, Inf. Process. Lett..

[3] Wlodzimierz Funika,et al. Towards Autonomic Semantic-Based Management of Distributed Applications , 2010, Comput. Sci..

[4] James A. Evans,et al. Machine Science , 2010, Science.

[5] Ken E. Whelan,et al. The Automation of Science , 2009, Science.

[6] Hod Lipson,et al. Age-fitness pareto optimization , 2010, GECCO '10.

[7] John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[8] Hod Lipson,et al. Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[9] Zhiqiang Yao,et al. High performance parallel evolutionary algorithm model based on MapReduce framework , 2013, Int. J. Comput. Appl. Technol..

[10] Wlodzimierz Funika,et al. Genetic Programming in Automatic Discovery of Relationships in Computer System Monitoring Data , 2013, PPAM.

[11] Hod Lipson,et al. Data-Mining Dynamical Systems: Automated Symbolic System Identification for Exploratory Analysis , 2008 .

[12] Wlodzimierz Funika,et al. Semantic-Oriented Performance Monitoring of Distributed Applications , 2012, Comput. Informatics.