Developing higher-order networks with empirically selected units

Introduces a class of simple polynomial neural network classifiers, called mask perceptrons. A series of algorithms for practical development of such structures is outlined. It relies on ordering of input attributes with respect to their potential usefulness and heuristic driven generation and selection of hidden units (monomial terms) in order to combat the exponential explosion in the number of higher-order monomial terms to choose from. Results of tests for two popular machine learning benchmarking domains (mushroom classification and faulty LED-display), and for two nonstandard domains (spoken digit recognition and article category determination) are given. All results are compared against a number of other classifiers. A procedure for converting a mask perceptron to a classical logic production rule is outlined and shown to produce a number of 100% percent accurate simple rules after training on 6-20% of a database.

[1]  Dennis J. Volper,et al.  Quadratic function nodes: Use, structure and training , 1990, Neural Networks.

[2]  A. Kowalczyk,et al.  Discovering production rules with higher order neural networks: a case study. II , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3]  A. Kowalczyk,et al.  Experiments with ordering attributes for efficient connectionist system development , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[4]  A. Kowalczyk,et al.  Rough simplifications of decision tables , 1989 .

[5]  S. K. Michael Wong,et al.  Rough Sets: Probabilistic versus Deterministic Approach , 1988, Int. J. Man Mach. Stud..

[6]  Adam Kowalczyk,et al.  Discovering Production Rules with Higher Order Neural Networks , 1991, ML.

[7]  Larry J. Eshelman,et al.  Using Weighted Networks to Represent Classification Knowledge in Noisy Domains , 1988, ML.

[8]  Richard S. Sutton,et al.  Iterative Construction of Sparse Polynomial Approximations , 1991, NIPS.

[9]  Terence D. Sanger,et al.  A tree-structured adaptive network for function approximation in high-dimensional spaces , 1991, IEEE Trans. Neural Networks.

[10]  James C. Bezdek,et al.  Prototype classification and feature selection with fuzzy sets , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[12]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[13]  C. Lee Giles,et al.  Encoding Geometric Invariances in Higher-Order Neural Networks , 1987, NIPS.

[14]  Donald F. Specht,et al.  Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification , 1990, IEEE Trans. Neural Networks.

[15]  Demetri Psaltis,et al.  Nonlinear discriminant functions and associative memories , 1987 .

[16]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[17]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[18]  Anthony G. Constantinides,et al.  Further noise rejection in linear associative memories , 1992, Neural Networks.

[19]  Tsunehiro Aibara,et al.  An Improvement on the Moore-Penrose Generalized Inverse Associative Memory , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[20]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[21]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[22]  Richard P. Brent,et al.  Fast training algorithms for multilayer neural nets , 1991, IEEE Trans. Neural Networks.

[23]  Jason Catlett,et al.  Experiments on the Costs and Benefits of Windowing in ID3 , 1988, ML.

[24]  Demetri Psaltis,et al.  Higher order associative memories and their optical implementations , 1988, Neural Networks.

[25]  David E. Rumelhart,et al.  Product Units: A Computationally Powerful and Biologically Plausible Extension to Backpropagation Networks , 1989, Neural Computation.

[26]  B. Noble,et al.  Methods for Computing the Moorse-Penrose Generalized Inverse, and Related Matters , 1976 .

[27]  Pat Langley,et al.  Trading Off Simplicity and Coverage in Incremental concept Learning , 1988, ML.

[28]  Jan M. Van Campenhout 36 Topics in measurement selection , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[29]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[30]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[31]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[32]  A Kowalczyk,et al.  Experiments in lexical classification by multi-layer perceptrons with simplified interconnection weights. , 1990 .

[33]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.