Boosted ARTMAP: Modifications to fuzzy ARTMAP motivated by boosting theory

In this paper, several modifications to the Fuzzy ARTMAP neural network architecture are proposed for conducting classification in complex, possibly noisy, environments. The goal of these modifications is to improve upon the generalization performance of Fuzzy ART-based neural networks, such as Fuzzy ARTMAP, in these situations. One of the major difficulties of employing Fuzzy ARTMAP on such learning problems involves over-fitting of the training data. Structural risk minimization is a machine-learning framework that addresses the issue of over-fitting by providing a backbone for analysis as well as an impetus for the design of better learning algorithms. The theory of structural risk minimization reveals a trade-off between training error and classifier complexity in reducing generalization error, which will be exploited in the learning algorithms proposed in this paper. Boosted ART extends Fuzzy ART by allowing the spatial extent of each cluster formed to be adjusted independently. Boosted ARTMAP generalizes upon Fuzzy ARTMAP by allowing non-zero training error in an effort to reduce the hypothesis complexity and hence improve overall generalization performance. Although Boosted ARTMAP is strictly speaking not a boosting algorithm, the changes it encompasses were motivated by the goals that one strives to achieve when employing boosting. Boosted ARTMAP is an on-line learner, it does not require excessive parameter tuning to operate, and it reduces precisely to Fuzzy ARTMAP for particular parameter values. Another architecture described in this paper is Structural Boosted ARTMAP, which uses both Boosted ART and Boosted ARTMAP to perform structural risk minimization learning. Structural Boosted ARTMAP will allow comparison of the capabilities of off-line versus on-line learning as well as empirical risk minimization versus structural risk minimization using Fuzzy ARTMAP-based neural network architectures. Both empirical and theoretical results are presented to enhance the understanding of these architectures.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems , 1997 .

[3]  John S. Baras,et al.  Combined compression and classification with learning vector quantization , 1999, IEEE Trans. Inf. Theory.

[4]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[5]  Georgios C. Anagnostopoulos,et al.  Universal approximation with Fuzzy ART and Fuzzy ARTMAP , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[6]  James R. Williamson,et al.  Gaussian ARTMAP: A Neural Network for Fast Incremental Learning of Noisy Multidimensional Maps , 1996, Neural Networks.

[7]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  Gail A. Carpenter,et al.  Distributed ARTMAP: a neural network for fast distributed supervised learning , 1998, Neural Networks.

[10]  Gail A. Carpenter,et al.  ART-EMAP: A neural network architecture for object recognition by evidence accumulation , 1995, IEEE Trans. Neural Networks.

[11]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[12]  Yannis A. Dimitriadis,et al.  MicroARTMAP: use of mutual information for category reduction in fuzzy ARTMAP , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[13]  W. Rudin Real and complex analysis, 3rd ed. , 1987 .

[14]  Georgios C. Anagnostopoulos,et al.  Ellipsoid ART and ARTMAP for incremental clustering and classification , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[15]  W. Rudin Real and complex analysis , 1968 .

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Yishay Mansour,et al.  On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[19]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[20]  J. W. Howse,et al.  Learning from examples: from theory to practice , 2001 .

[21]  Yannis A. Dimitriadis,et al.  Artmap: Use of Mutual Information for Category Reduction in Fuzzy Artmap , 2002 .

[22]  Robert F. Harrison,et al.  A modified fuzzy ARTMAP architecture for the approximation of noisy mappings , 1995, Neural Networks.

[23]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[24]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[25]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[26]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[27]  Michael Georgiopoulos,et al.  Properties of learning of the fuzzy art neural network and improvements of the generalization performance of the fuzzy artmap neural network , 1997 .

[28]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[29]  Gail A. Carpenter,et al.  ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases , 1998, Neural Networks.

[30]  Yishay Mansour,et al.  On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[31]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[32]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .