Optimization of Self-Organizing Maps Ensemble in Prediction

The knowledge discovery process encounters the difficulties to analyze large amount of data. Indeed, some theoretical problems related to high dimensional spaces then appear and degrade the predictive capacity of algorithms. In this paper, we propose a new methodology to get a better representation and prediction of huge datasets. For that purpose, an ensemble approach is used to overcome problems related to high dimensional spaces. Self-Organized Map, which allows both a fast learning and a navigation through the data is used like base classifiers to learn several feature subspaces. A genetic algorithm optimizes diversity of the ensemble thanks to an adapted error measure. The experimentations show that this measure helps to construct a concise ensemble keeping representation capabilities. Furthermore, this approach is competitive in prediction with Boosting and Random Forests.

[1]  Michel Verleysen,et al.  On the Effects of Dimensionality on Data Analysis with Neural Networks , 2009, IWANN.

[2]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[3]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[4]  Luiz Eduardo Soares de Oliveira,et al.  Feature selection for ensembles:a hierarchical multi-objective genetic algorithm approach , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Kagan Tumer,et al.  Theoretical Foundations Of Linear And Order Statistics Combiners For Neural Pattern Classifiers , 1995 .

[6]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[7]  Stéphane Lallich,et al.  Maps Ensemble for Semi-Supervised Learning of Large High Dimensional Datasets , 2008, ISMIS.

[8]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Gavin Brown,et al.  Ensemble Learning in Linearly Combined Classifiers Via Negative Correlation , 2007, MCS.

[11]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Pierre Demartines Analyse de donnees par reseaux de neurones auto-organises , 1994 .

[14]  Ricco Rakotomalala,et al.  TANAGRA : un logiciel gratuit pour l'enseignement et la recherche , 2005, EGC.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[17]  Jianxin Wu,et al.  Genetic Algorithm based Selective Neural Network Ensemble , 2001, IJCAI.

[18]  Robert P. W. Duin,et al.  Experiments with Classifier Combining Rules , 2000, Multiple Classifier Systems.

[19]  Sergey Ablameyko,et al.  Limitations and Future Trends in Neural Computation , 2003 .

[20]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[21]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[22]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[23]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[24]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.