Generalized Constraint Neural Network Regression Model Subject to Linear Priors

This paper is reports an extension of our previous investigations on adding transparency to neural networks. We focus on a class of linear priors (LPs), such as symmetry, ranking list, boundary, monotonicity, etc., which represent either linear-equality or linear-inequality priors. A generalized constraint neural network-LPs (GCNN-LPs) model is studied. Unlike other existing modeling approaches, the GCNN-LP model exhibits its advantages. First, any LP is embedded by an explicitly structural mode, which may add a higher degree of transparency than using a pure algorithm mode. Second, a direct elimination and least squares approach is adopted to study the model, which produces better performances in both accuracy and computational cost over the Lagrange multiplier techniques in experiments. Specific attention is paid to both “hard (strictly satisfied)” and “soft (weakly satisfied)” constraints for regression problems. Numerical investigations are made on synthetic examples as well as on the real-world datasets. Simulation results demonstrate the effectiveness of the proposed modeling approach in comparison with other existing approaches.

[1]  Xiaojin Zhu,et al.  Kernel Regression with Order Preferences , 2007, AAAI.

[2]  Mark A. Kramer,et al.  Modeling chemical processes using prior knowledge and neural networks , 1994 .

[3]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[4]  Marina Velikova,et al.  Monotone and Partially Monotone Neural Networks , 2010, IEEE Transactions on Neural Networks.

[5]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[6]  Bernhard Schölkopf,et al.  Kernel Dependency Estimation , 2002, NIPS.

[7]  Gérard Bloch,et al.  Incorporating prior knowledge in support vector regression , 2007, Machine Learning.

[8]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[9]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[10]  Marina Velikova,et al.  Comparison of universal approximators incorporating partial monotonicity by structure , 2010, Neural Networks.

[11]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[12]  Gérard Bloch,et al.  Incorporating prior knowledge in support vector machines for classification: A review , 2008, Neurocomputing.

[13]  Zhu Ming-xing Study on the Algorithms of Selecting the Radial Basis Function Center , 2000 .

[14]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[15]  Yaser S. Abu-Mostafa,et al.  A Method for Learning From Hints , 1992, NIPS.

[16]  R. Fletcher Practical Methods of Optimization , 1988 .

[17]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[18]  Maria Bortman,et al.  A Growing and Pruning Method for Radial Basis Function Networks , 2009, IEEE Transactions on Neural Networks.

[19]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[20]  James A. Reggia,et al.  Guiding Hidden Layer Representations for Improved Rule Extraction From Neural Networks , 2011, IEEE Transactions on Neural Networks.

[21]  Naftali Tishby,et al.  Incorporating Prior Knowledge on Features into Learning , 2007, AISTATS.

[22]  Zhe Sun,et al.  Cutting Plane Method for Continuously Constrained Kernel-Based Regression , 2010, IEEE Transactions on Neural Networks.

[23]  Roger Fletcher,et al.  Practical methods of optimization; (2nd ed.) , 1987 .

[24]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[25]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[26]  Silvia Ferrari,et al.  A Constrained Optimization Approach to Preserving Prior Knowledge During Incremental Training , 2008, IEEE Transactions on Neural Networks.

[27]  Hu Bao How to Add Transparency to Artificial Neural Networks , 2007 .

[28]  Yong Wang,et al.  A generalized-constraint neural network model: Associating partially known relationships for nonlinear regressions , 2009, Inf. Sci..

[29]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[30]  Chris J. Harris,et al.  Grey-box radial basis function modelling: The art of incorporating prior knowledge , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[31]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[32]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[33]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[34]  Ting Yu,et al.  Incorporating Prior Domain Knowledge into , 2007 .

[35]  Olvi L. Mangasarian,et al.  Nonlinear Knowledge in Kernel Approximation , 2007, IEEE Transactions on Neural Networks.

[36]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[37]  Bo Dai,et al.  Neural-network based regression model with prior from ranking information , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[38]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[39]  Joachim Diederich,et al.  The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks , 1998, IEEE Trans. Neural Networks.

[40]  Bao-Gang Hu,et al.  RBF networks for nonlinear models subject to linear constraints , 2009, 2009 IEEE International Conference on Granular Computing.

[41]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[42]  Witold Pedrycz,et al.  Logic-oriented neural networks for fuzzy neurocomputing , 2009, Neurocomputing.

[43]  Lyle H. Ungar,et al.  A hybrid neural network‐first principles approach to process modeling , 1992 .

[44]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[45]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[46]  Bart Baesens,et al.  Decompositional Rule Extraction from Support Vector Machines by Active Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[47]  Miguel Lázaro-Gredilla,et al.  Support Vector Machines With Constraints for Sparsity in the Primal Parameters , 2011, IEEE Transactions on Neural Networks.

[48]  Yoshua Bengio,et al.  Incorporating Functional Knowledge in Neural Networks , 2009, J. Mach. Learn. Res..

[49]  Marina Velikova,et al.  Mixtures of Monotone Networks for Prediction , 2006 .

[50]  Jude W. Shavlik,et al.  Knowledge-Based Kernel Approximation , 2004, J. Mach. Learn. Res..

[51]  Jacek M. Zurada,et al.  Extraction of rules from artificial neural networks for nonlinear regression , 2002, IEEE Trans. Neural Networks.