ON THE USE OF VARIOUS INPUT SUBSETS FOR STACKED GENERALIZATION

Artificial Neural Networks (ANNs) can be useful for modeling real world processes such as time series weather, financial, or chaotic data. The generalization and robustness of these models can be improved and estimates of the modeling error distributions can be made using a technique called Stacked Generalization (SG). SG, introduced by Wolpert and used by Kim, Sharkness, Sridhar, and Bartlett, has shown the ability to improve ANN model viability. SG uses a number of diverse ANN models, each of which is trained and queried on independent cross validation subsets of the process data. These models are then compared and the resulting information is used in the stacking process to train additional ANNs to combine the models and provide error estimates. However, since the SG improvements and error estimates depend on the individual model response diversity, we add the additional feature to SG of allowing these networks to have different input variable subspace vectors. Thus, each model’s response expresses different features of the process to be modeled and a more complete representation of the overall process is obtained. This ability to have different input vectors improves overall accuracy. Examples are provided which demonstrate the advantages of our modifications to SG. INTRODUCTION Artificial neural network (ANN) modeling techniques have many fascinating characteristics. For example, ANNs are capable of learning by example and learning without explicit knowledge of the underlying system which they are mimicking [Rumelhart, McClelland & the PDP Research Group, 1986; Lippmann, 1987; Blum and Li, 1991; Hecht-Nielsen, 1990; Kurkova, 1992; Haykin, 1999]. Fortunately, ANNs are capable of generalizing this knowledge. Numerous applications of ANNs exist in engineering [Lapedes & Farber, 1987; Narendra & Parthasarathy, 1990; Zhang, Mesirov & Waltz, 1992], process control and planning [Bhat & McAvoy, 1990; Miller, Sutton & Werbos, 1990], plant monitoring [Uhrig, 1989; Upadhyaya & Eryurek, 1992], and fault diagnosis [Venkatasubramanian & Chan, 1989; Bartlett & Uhrig, 1992]. There are countless others. Unfortunately, the lack of validation and verification options for ANN methods restricts the areas where they can be applied successfully. Only the courageous apply ANNs to important real – world applications. Typically, ANN users assume that the output of their ANNs are, in fact, reliable. They certainly hope so. However, the nonlinearity of neural networks provides no guarantees on the behavior of the a posteriori model prediction errors. Without a model error estimate – who would know about a model’s shortfalls? This paper addresses an improved solution to this error prediction deficiency in the application and use of ANN techniques. Our method, called Modified Series Association (MSA) provides a reliable error estimate on each and every model prediction and does so in a concise and efficient manner. Stacked Generalization (SG) was proposed as a method of using multiple models to provide improved accuracy or confidence intervals [Wolpert, 1992]. Wolpert suggests using a number of models, in our case ANNs, on different subsets of the data, in order to obtain models that are slightly different. These models are then recalled over the remainder of the data and this information is used in the stacking process. We propose to go an additional step further to obtain model diversity. We add the additional complexity of allowing the models to have different input sub-vectors of the input space. Obviously, this works well only on problems that are over specified in the input vector space. The information generated by querying these various networks on their respective cross validation sets constitutes new information, since they are not expected to produce exactly identical results. This information is then used to develop the error predictor and consolidation models. SG is a method of developing a stacked model that utilizes the results of two levels of multiple networks or generalizers. In our case, the first level of models, called the Level 0 networks, are trained not only on partitions of the given data set, but also subspaces input vector. The Level 0 models are then queried, or recalled, on the unused, or cross validation, data. The results of the recall of the Level 0 models on these “novel” cross validation partitions are then stored. The second level of models, called the Level 1 models, are then developed using the results of this Level 0 cross validation recall. These Level 1 models provide the consolidated, or stacked, model output and the corresponding errors for each prediction. These Level 1 results are based on the diverse behavior of the Level 0 models. Once the Level 1 models are created with the Level 0 cross validation data sets, then these Level 0 models are discarded. New Level 0 models are then developed on the complete training data set without partition using their appropriate input subspace. These new networks are then used in the stacked recall process to generate consolidated predictions and their associated uncertainties. We have shown this approach to be effective in conducting error analyses on ANN models for pattern recognition [Kim and Bartlett, 1996] and to provide improved functional models of the desired outputs [Sridhar, 1996]. When properly combined, these various models yield reliable error estimates for ANN function approximation, as will be shown in this paper. MODEL CONSOLIDATION AND ERROR ESTIMATION ANNs can be regarded as generalizers because they infer parent functions from sets of data [Cybenko, 1989; Wolpert, 1990; Kurkova, 1992]. Most other modeling methods can also be considered to be generalizers as well. For example, statistical and even first principle methods can be considered as generalizers. Therefore, the following discussion can be applied to a large class of modeling methods. It is, however, difficult to find methods for error estimation and consolidation of general data driven nonlinear modeling techniques such as ANNs, and this is where our MSA technique can be used to full effect [Narendra and Parthasarathy, 1990; Blum & Li, 1991; Bartlett, 1992 & 1994; Bartlett & Kim, 1993]. As a preliminary to our discussion of MSA, we need to define some useful concepts for the continuous modeling problem of interest. Given a set of N data exemplars of the input vector x and the output vector y to be modeled {x, y}N which may possibly contain noise, for n = 1,2, ..., N, where y is a function of x and time and x is also a function of time. We have; )} ( ) 2 ( ) ( ) ( ) ( ) ( { ) ( t-K , , t, t-1 ; t-L , , t-1 , t F t y y y x x x y ... ... = (1) Where N is the total number of data exemplars patterns at our disposal at the time of model development. And xn(t) and yn(t) are vectors such that, )} ( ) ( ) ( { ) ( 2 1 , t , x , t , x t x t n,I , n n n ... = x (2) and )} ( ) ( ) ( { ) ( 2 1 t , y , t , y t y t n,J n, n, n ... = y

[1]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[2]  David H. Wolpert,et al.  A Mathematical Theory of Generalization: Part I , 1990, Complex Syst..

[3]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[4]  K S Narendra,et al.  IDENTIFICATION AND CONTROL OF DYNAMIC SYSTEMS USING NEURAL NETWORKS , 1990 .

[5]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[6]  Edward K. Blum,et al.  Approximation theory and feedforward networks , 1991, Neural Networks.

[7]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[8]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[9]  Belle R. Upadhyaya,et al.  Application of Neural Networks for Sensor Validation and Plant Monitoring , 1990 .

[10]  Eric B. Bartlett,et al.  Process modeling using stacked neural networks , 1996 .

[11]  Eric B. Bartlett,et al.  Nuclear power plant fault diagnosis using neural networks with error estimation by series association , 1996 .

[12]  Thomas J. Mc Avoy,et al.  Use of Neural Nets For Dynamic Modeling and Control of Chemical Process Systems , 1989, 1989 American Control Conference.

[13]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[14]  Chris Aldrich,et al.  Combinatorial evolution of regression nodes in feedforward neural networks , 1999, Neural Networks.

[15]  D. E. Rumelhart,et al.  chapter Parallel Distributed Processing, Exploration in the Microstructure of Cognition , 1986 .

[16]  Eric B. Bartlett,et al.  Nuclear power plant status diagnostics using an artificial neural network , 1992 .

[17]  R. E. Uhrig Use of neurals networks in nuclear power plant diagnostics , 1989 .

[18]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[19]  Eric B. Bartlett,et al.  Dynamic node architecture learning: An information theoretic approach , 1994, Neural Networks.

[20]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.