Big Data Spectra Analysis Using Analytical Programming and Random Decision Forests

Spectra analysis on large datasets is in focus of this paper. First of all we discuss a method useful for spectra analysis – analytical programming and its implementation. Our goal is to create mathematical formulas of emission lines from spectra, which are characteristic for Be stars. One issue in performing this task is symbolic regression, which represents the process in our application, when measured data fits the best represented mathematical formula. In past this was only a human domain; nowadays, there are computer methods, which allow us to do it more or less effectively. A novel method in symbolic regression, compared to genetic programming and grammar evolution, is analytic programming. The aim of this work is to verify the efficiency of the parallel approach of this algorithm, using CUDA architecture. Next we will discuss parallel implementation of random decision forest (RDF) to classify huge amounts of various spectra. The mathematical formulas obtained via AP will be used to reduce attributes of explored spectra. Our goal is to propose scalable algorithm for classification of such data, which will preferably need only one pass over data, while maintaining acceptable accuracy. Later we will try to create module compatible with VO and DAta Mining and Exploration project.

[1]  D. Wells,et al.  Fits: a flexible image transport system , 1981 .

[2]  Yael Ben-Haim,et al.  A Streaming Parallel Decision Tree Algorithm , 2010, J. Mach. Learn. Res..

[3]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Václav Snásel,et al.  Many-threaded implementation of differential evolution for the CUDA platform , 2011, GECCO '11.

[5]  Primoz Kralj Differential Evolution with parallelised objective functions using CUDA , 2013 .

[6]  Renato A. Krohling,et al.  Differential evolution algorithm on the GPU with C-CUDA , 2010, IEEE Congress on Evolutionary Computation.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Robert M. Farber,et al.  CUDA Application Design and Development , 2011 .

[9]  Maarten Keijzer Symbolic regression , 2008, GECCO '08.

[10]  Roman Senkerik,et al.  Analytical Programming - a Novel Approach for Evolutionary Synthesis of Symbolic Structures , 2011 .

[11]  Ivan Zelinka,et al.  Mechanical engineering design optimization by differential evolution , 1999 .

[12]  I. Zelinka,et al.  ANALYTIC PROGRAMMING – SYMBOLIC REGRESSION BY MEANS OF ARBITRARY EVOLUTIONARY ALGORITHMS , 2005 .

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  李俊杰,et al.  Scalable Random Forests for Massive Data , 2012 .

[15]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[16]  Zbigniew Michalewicz,et al.  Evolutionary algorithms , 1997, Emerging Evolutionary Algorithms for Antennas and Wireless Communications.

[17]  J. Porter,et al.  Classical Be Stars , 2003, 1310.3962.