Neural methods for non-standard data

Standard pattern recognition provides effective and noise-tolerant tools for machine learning tasks; however, most approaches only deal with real vectors of a finite and fixed dimensionality. In this tutorial paper, we give an overview about extensions of pattern recognition towards non-standard data which are not contained in a finite dimensional space, such as strings, sequences, trees, graphs, or functions. Two major directions can be distinguished in the neural networks literature: models can be based on a similarity measure adapted to non-standard data, including kernel methods for structures as a very prominent approach, but also alternative metric based algorithms and functional networks; alternatively, non-standard data can be processed recursively within supervised and unsupervised recurrent and recursive networks and fully recurrent systems.

[1]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[2]  John G. Taylor,et al.  The temporal Kohönen map , 1993, Neural Networks.

[3]  Eam Khwang Teoh,et al.  Pattern recognition by graph matching using the Potts MFT neural networks , 1995, Pattern Recognit..

[4]  Dan Roth,et al.  On Kernel Methods for Relational Learning , 2003, ICML.

[5]  John D. Lafferty,et al.  Information Diffusion Kernels , 2002, NIPS.

[6]  Paolo Frasconi,et al.  Disulfide connectivity prediction using recursive neural networks and evolutionary information , 2004, Bioinform..

[7]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[8]  Jan C. Wiemer,et al.  The Time-Organized Map Algorithm: Extending the Self-Organizing Map to Spatiotemporal Signals , 2003, Neural Computation.

[9]  M. Kanehisa,et al.  Graph-driven features extraction from microarray data , 2002, physics/0206055.

[10]  Harry G. Barrow,et al.  A Versatile Computer-Controlled Assembly System , 1973, IJCAI.

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  Marcello Pelillo,et al.  Replicator Equations, Maximal Cliques, and Graph Isomorphism , 1998, Neural Computation.

[13]  C. Watkins Dynamic Alignment Kernels , 1999 .

[14]  Gunnar Rätsch,et al.  New Methods for Splice Site Recognition , 2002, ICANN.

[15]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[16]  Florence d'Alché-Buc,et al.  Mixtures of Probabilistic PCAs and Fisher Kernels for Word and Document Modeling , 2002, ICANN.

[17]  Alessio Micheli,et al.  Contextual processing of structured data by recursive cascade correlation , 2004, IEEE Transactions on Neural Networks.

[18]  Tianping Chen,et al.  Networks with Application to Dynamic Systems , 1993 .

[19]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[20]  Gareth M. James,et al.  Functional linear discriminant analysis for irregularly sampled curves , 2001 .

[21]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[22]  Stefan C. Kremer,et al.  Spatiotemporal Connectionist Networks: A Taxonomy and Review , 2001, Neural Computation.

[23]  Eric Mjolsness,et al.  A Lagrangian relaxation network for graph matching , 1996, IEEE Trans. Neural Networks.

[24]  Nello Cristianini,et al.  Learning Semantic Similarity , 2002, NIPS.

[25]  Nasser M. Nasrabadi,et al.  Object recognition by a Hopfield neural network , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[26]  Aluizio F. R. Araújo,et al.  Context in temporal sequence processing: a self-organizing approach and its application to robotics , 2002, IEEE Trans. Neural Networks.

[27]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[28]  N. M. Nasrabadi,et al.  Object recognition based on graph matching implemented by a Hopfield-style neural network , 1989, International 1989 Joint Conference on Neural Networks.

[29]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[30]  Horst Bunke,et al.  On Median Graphs: Properties, Algorithms, and Applications , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Panu Somervuo,et al.  How to make large self-organizing maps for nonvectorial data , 2002, Neural Networks.

[32]  Alessio Micheli,et al.  Application of Cascade Correlation Networks for Structures to Chemistry , 2004, Applied Intelligence.

[33]  Eric Mjolsness,et al.  Learning with Preknowledge: Clustering with Point and Graph Matching Distance Measures , 1996, Neural Computation.

[34]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[35]  Giovanni Soda,et al.  Bidirectional Dynamics for Protein Secondary Structure Prediction , 2001, Sequence Learning.

[36]  Kristina Schädler,et al.  Comparing Structures Using a Hopfield-Style Neural Network , 1999, Applied Intelligence.

[37]  Christoph von der Malsburg,et al.  Pattern recognition by labeled graph matching , 1988, Neural Networks.

[38]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  G. Levi A note on the derivation of maximal common subgraphs of two directed or undirected graphs , 1973 .

[40]  Kristina Schädler,et al.  Application of a neural net in classification and knowledge discovery , 1998, ESANN.

[41]  Fritz Wysotzki,et al.  A Competitive Winner-Takes-All Architecture for Classification and Pattern Recognition of Structures , 2003, GbRPR.

[42]  Marc Strickert,et al.  Neural Gas for Sequences , 2003 .

[43]  Marco Gori,et al.  Recursive Neural Networks Applied to Discourse Representation Theory , 2002, ICANN.

[44]  Seishi Nishikawa,et al.  Comparisons of Energy-Descent Optimization Algorithms for Maximum Clique Problems , 1996 .

[45]  John Shawe-Taylor,et al.  String Kernels, Fisher Kernels and Finite State Automata , 2002, NIPS.

[46]  Fritz Wysotzki,et al.  Structural Perceptrons for Attributed Graphs , 2004, SSPR/SPR.

[47]  Edwin R. Hancock,et al.  An Energy Function and Continuous Edit Process for Graph Matching , 1998, Neural Computation.

[48]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[49]  Peter Willett,et al.  Heuristics for Similarity Searching of Chemical Graphs Using a Maximum Common Edge Subgraph Algorithm , 2002, J. Chem. Inf. Comput. Sci..

[50]  Wei-Chung Lin,et al.  A hierarchical multiple-view approach to three-dimensional object recognition , 1991, IEEE Trans. Neural Networks.

[51]  Horst Bunke,et al.  Self-organizing map for clustering in the graph domain , 2002, Pattern Recognit. Lett..

[52]  Rong Long Wang,et al.  An Efficient Approximation Algorithm for Finding a Maximum Clique Using Hopfield Network Learning , 2003, Neural Computation.

[53]  P. Sarda,et al.  Functional linear model , 1999 .

[54]  Edwin R. Hancock,et al.  Efficiently Computing Weighted Tree Edit Distance Using Relaxation Labeling , 2001, EMMCVPR.

[55]  Fritz Wysotzki,et al.  Fast Winner-Takes-All Networks for the Maximum Clique Problem , 2002, KI.

[56]  Jürgen Schmidhuber,et al.  Learning Nonregular Languages: A Comparison of Simple Recurrent Networks and LSTM , 2002, Neural Computation.

[57]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[58]  Fritz Wysotzki,et al.  Applied Connectionistic Methods in Computer Vision to Compare Segmented Images , 2003, KI.

[59]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[60]  Robert M. Haralick,et al.  A Metric for Comparing Relational Descriptions , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Fritz Wysotzki,et al.  The maximum weighted clique problem and Hopfield networks , 2004, ESANN.

[62]  King-Sun Fu,et al.  A distance measure between attributed relational graphs for pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[63]  T. Motzkin,et al.  Maxima for Graphs and a New Proof of a Theorem of Turán , 1965, Canadian Journal of Mathematics.

[64]  Paolo Frasconi,et al.  Hidden Tree Markov Models for Document Image Classification , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Thomas Voegtlin,et al.  Recursive self-organizing maps , 2002, Neural Networks.

[66]  A. Jagota,et al.  Feasible and infeasible maxima in a quadratic program for maximum clique , 1996 .

[67]  Franco Scarselli,et al.  Processing directed acyclic graphs with recursive neural networks , 2001, IEEE Trans. Neural Networks.

[68]  Petar D. Simic Constrained Nets for Graph Matching and Other Quadratic Assignment Problems , 1991, Neural Comput..

[69]  Jason Weston,et al.  Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.

[70]  Jun Suzuki,et al.  Kernels for Structured Natural Language Data , 2003, NIPS.

[71]  Aluizio F. R. Araújo,et al.  A Taxonomy for Spatiotemporal Connectionist Networks Revisited: The Unsupervised Case , 2003, Neural Computation.

[72]  R Kree,et al.  Recognition of topological features of graphs and images in neural networks , 1988 .

[73]  Christoph Goller,et al.  A connectionist approach for learning search-control heuristics for automated deduction systems , 1999, DISKI.

[74]  Arun Jagota,et al.  Approximating maximum clique with a Hopfield network , 1995, IEEE Trans. Neural Networks.

[75]  Tony Jebara,et al.  A Kernel Between Sets of Vectors , 2003, ICML.

[76]  P. Frasconi,et al.  Learning first-pass structural attachment preferences with dynamic grammars and recursive neural networks , 2003, Cognition.

[77]  Paul Rodríguez,et al.  Simple Recurrent Networks Learn Context-Free and Context-Sensitive Languages by Counting , 2001, Neural Computation.

[78]  Harry G. Barrow,et al.  Subgraph Isomorphism, Matching Relational Structures and Maximal Cliques , 1976, Inf. Process. Lett..

[79]  Alessio Micheli,et al.  Recursive self-organizing network models , 2004, Neural Networks.

[80]  Alessio Micheli,et al.  A general framework for unsupervised processing of structured data , 2004, Neurocomputing.

[81]  Andrew D. Back,et al.  Universal Approximation of Multiple Nonlinear Operators by Neural Networks , 2002, Neural Computation.

[82]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[83]  Gunnar Rätsch,et al.  A New Discriminative Kernel from Probabilistic Models , 2001, Neural Computation.

[84]  Ah Chung Tsoi,et al.  A self-organizing map for adaptive processing of structured data , 2003, IEEE Trans. Neural Networks.

[85]  Panos M. Pardalos,et al.  Continuous Characterizations of the Maximum Clique Problem , 1997, Math. Oper. Res..

[86]  François Fleuret,et al.  Theoretical properties of functional Multi Layer Perceptrons , 2002, ESANN.

[87]  Alessandro Sperduti,et al.  Supervised neural networks for the classification of structures , 1997, IEEE Trans. Neural Networks.

[88]  Ah Chung Tsoi,et al.  An improved algorithm for learning long-term dependency problems in adaptive processing of data structures , 2003, IEEE Trans. Neural Networks.

[89]  Barbara Hammer,et al.  Learning with recurrent neural networks , 2000 .

[90]  Yuan Yao,et al.  Combining flat and structured representations for fingerprint classification with recursive neural networks and support vector machines , 2003, Pattern Recognit..

[91]  Robert C. Bolles,et al.  3DPO: A Three- Dimensional Part Orientation System , 1986, IJCAI.

[92]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[93]  J. J. Kosowsky,et al.  Statistical Physics Algorithms That Converge , 1994, Neural Computation.

[94]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[95]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[96]  Kaleem Siddiqi,et al.  Matching Hierarchical Structures Using Association Graphs , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[97]  Immanuel M. Bomze,et al.  Evolution towards the Maximum Clique , 1997, J. Glob. Optim..

[98]  Paolo Frasconi,et al.  Prediction of Protein Topologies Using GIOHMMs and GRNNs , 2003, NIPS 2003.

[99]  Jean-Philippe Vert,et al.  Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA , 2002, NIPS.

[100]  Janet Wiles,et al.  On learning context-free and context-sensitive languages , 2002, IEEE Trans. Neural Networks.

[101]  José Carlos Príncipe,et al.  Principles and networks for self-organization in space-time , 2002, Neural Networks.