Probabilistic based recursive model for adaptive processing of data structures

One of the most popular frameworks for the adaptive processing of data structures to date, was proposed by Frasconi et al. [Frasconi, P., Gori, M., & Sperduti, A. (1998). A general framework for adaptive processing of data structures. IEEE Transactions on Neural Networks, 9(September), 768-785], who used a Backpropagation Through Structures (BPTS) algorithm [Goller, C., & Kuchler, A. (1996). Learning task-dependent distributed representations by back-propagation through structures. In Proceedings of IEEE international conference on neural networks (pp. 347-352); Tsoi, A. C. (1998). Adaptive processing of data structure: An expository overview and comments. Technical report in Faculty Informatics. Wollongong, Australia: University of Wollongong] to carry out supervised learning. This supervised model has been successfully applied to a number of learning tasks that involve complex symbolic structural patterns, such as image semantic structures, internet behavior, and chemical compounds. In this paper, we extend this model, using probabilistic estimates to acquire discriminative information from the learning patterns. Using this probabilistic estimation, smooth discriminant boundaries can be obtained through a process of clustering onto the observed input attributes. This approach enhances the ability of class discrimination techniques to recognize structural patterns. The proposed model is represented by a set of Gaussian Mixture Models (GMMs) at the hidden layer and a set of ''weighted sum input to sigmoid function'' models at the output layer. The proposed model's learning framework is divided into two phases: (a) locally unsupervised learning for estimating the parameters of the GMMs and (b) globally supervised learning for fine-tuning the GMMs' parameters and optimizing weights at the output layer. The unsupervised learning phase is formulated as a maximum likelihood problem that is solved by the expectation-maximization (EM) algorithm. The supervised learning phase is formulated as a cost minimization problem, using the least squares optimization or Levenberg-Marquardt method. The capabilities of the proposed model are evaluated in several simulation platforms. From the results of the simulations, not only does the proposed model outperform the original recursive model in terms of learning performance, but it is also significantly better at classifying and recognizing structural patterns.

[1]  Geoffrey J. McLachlan,et al.  Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification , 2004, IEEE Transactions on Neural Networks.

[2]  Giovanni Soda,et al.  Towards Incremental Parsing of Natural Language Using Recursive Neural Networks , 2003, Applied Intelligence.

[3]  Peter Clark,et al.  Induction in Noisy Domains , 1987, EWSL.

[4]  Tommy W. S. Chow,et al.  A Layer-by-Layer Least Squares based Recurrent Networks Training Algorithm: Stalling and Escape , 2004, Neural Processing Letters.

[5]  Sun-Yuan Kung,et al.  Estimation of elliptical basis function parameters by the EM algorithm with application to speaker verification , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6]  Hujun Yin,et al.  ViSOM - a novel method for multivariate data projection and structure visualization , 2002, IEEE Trans. Neural Networks.

[7]  Ah Chung Tsoi,et al.  A self-organizing map for adaptive processing of structured data , 2003, IEEE Trans. Neural Networks.

[8]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Barbara Hammer,et al.  Learning with recurrent neural networks , 2000 .

[11]  Alessandro Sperduti,et al.  Supervised neural networks for the classification of structures , 1997, IEEE Trans. Neural Networks.

[12]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.

[13]  Marco Gori,et al.  Adaptive processing of sequences and data structures : International Summer School on Neural Networks "E.R. Caianiello", Vietri sul Mare, Salerno, Italy, September 6-13, 1997, tutorial lectures , 1998 .

[14]  John F. Kolen,et al.  Field Guide to Dynamical Recurrent Networks , 2001 .

[15]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[16]  Ah Chung Tsoi,et al.  An improved algorithm for learning long-term dependency problems in adaptive processing of data structures , 2003, IEEE Trans. Neural Networks.

[17]  Shie-Jue Lee,et al.  Entropy-based generation of supervised neural networks for classification of structured patterns , 2004, IEEE Transactions on Neural Networks.

[18]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[19]  Alessio Micheli,et al.  A general framework for unsupervised processing of structured data , 2004, Neurocomputing.

[20]  Ah Chung Tsoi,et al.  Gradient Based Learning Methods , 1997, Summer School on Neural Networks.

[21]  Rudy Setiono,et al.  Extracting rules from pruned networks for breast cancer diagnosis , 1996, Artif. Intell. Medicine.

[22]  Thomas Voegtlin,et al.  Context quantization and contextual self-organizing maps , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[23]  Stephen J. Roberts,et al.  A Probabilistic Resource Allocating Network for Novelty Detection , 1994, Neural Computation.

[24]  Philippe Salembier,et al.  Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval , 2000, IEEE Trans. Image Process..

[25]  Barbara Hammer,et al.  Neural networks can approximate mappings on structured objects , 1997 .

[26]  H. T. Nagle,et al.  Feature extraction by genetic algorithms for neural networks in breast cancer classification , 1995, Proceedings of 17th International Conference of the Engineering in Medicine and Biology Society.

[27]  W. N. Street,et al.  Computerized breast cancer diagnosis and prognosis from fine-needle aspirates. , 1995, Archives of surgery.

[28]  Jacek M. Zurada,et al.  Extraction of rules from artificial neural networks for nonlinear regression , 2002, IEEE Trans. Neural Networks.

[29]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[31]  John F. Kolen,et al.  From Sequences to Data Structures: Theory and Applications , 2001 .

[32]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[33]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[34]  Zheru Chi,et al.  Genetic evolution processing of data structures for image classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[35]  Robert F. Harrison,et al.  A decision support tool for the diagnosis of breast cancer based upon Fuzzy ARTMAP , 1998, Neural Computing & Applications.

[36]  Sun-Yuan Kung,et al.  Face recognition/detection by probabilistic decision-based neural network , 1997, IEEE Trans. Neural Networks.

[37]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[38]  Roy L. Streit,et al.  Maximum likelihood training of probabilistic neural networks , 1994, IEEE Trans. Neural Networks.

[39]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[40]  Giovanni Soda,et al.  Logo Recognition by Recursive Neural Networks , 1997, GREC.

[41]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[42]  Alessio Micheli,et al.  Application of Cascade Correlation Networks for Structures to Chemistry , 2004, Applied Intelligence.

[43]  Sun-Yuan Kung,et al.  Decision-based neural networks with signal/image classification applications , 1995, IEEE Trans. Neural Networks.

[44]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[45]  Marco Gori,et al.  Adaptive Processing of Sequences and Data Structures , 1998, Lecture Notes in Computer Science.

[46]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[47]  Christoph Goller,et al.  Feature extraction from data structures with unsupervised recursive neural networks , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[48]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[49]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[51]  Tommy W. S. Chow,et al.  PRSOM: a new visualization method by hybridizing multidimensional scaling and self-organizing map , 2005, IEEE Transactions on Neural Networks.