Transformation Invariance in Pattern Recognition - Tangent Distance and Tangent Propagation

In pattern recognition, statistical modeling, or regression, the amount of data is a critical factor affecting the performance. If the amount of data and computational resources are unlimited, even trivial algorithms will converge to the optimal solution. However, in the practical case, given limited data and other resources, satisfactory performance requires sophisticated methods to regularize the problem by introducing a priori knowledge. Invariance of the output with respect to certain transformations of the input is a typical example of such a priori knowledge. In this chapter, we introduce the concept of tangent vectors, which compactly represent the essence of these transformation invariances, and two classes of algorithms, “tangent distance” and “tangent propagation”, which make use of these invariances to improve performance.

[1]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[2]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[3]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[4]  R. Gilmore,et al.  Lie Groups, Lie Algebras, and Some of Their Applications , 1974 .

[5]  Choquet Bruhat,et al.  Analysis, Manifolds and Physics , 1977 .

[6]  R. Sibson Studies in the Robustness of Multidimensional Scaling: Procrustes Statistics , 1978 .

[7]  B. Ørsted Review: Yvonne Choquet-Bruhat, Cecile de Witt-Morette and Margaret Dillard-Bleick, Analysis, manifolds and physics , 1980 .

[8]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[9]  Keinosuke Fukunaga,et al.  An Optimal Global Nearest Neighbor Metric , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jean Voisin,et al.  An application of the multiedit-condensing technique to the reference selection problem in a print recognition system , 1987, Pattern Recognit..

[11]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[12]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[13]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[14]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[15]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[16]  William H. Press,et al.  Numerical recipes , 1990 .

[17]  Alan J. Broder Strategies for efficient incremental nearest neighbor search , 1990, Pattern Recognit..

[18]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[19]  Geoffrey E. Hinton,et al.  Adaptive Elastic Models for Hand-Printed Character Recognition , 1991, NIPS.

[20]  Trevor Hastie,et al.  A model for signature verification , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[21]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[22]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[23]  Patrice Y. Simard Efficient Computation of Complex Distance Metrics Using Hierarchical Filtering , 1993, NIPS.

[24]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[25]  Gordon T. Wilfong,et al.  On-Line Recognition of Handwritten Symbols , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Nuno Vasconcelos,et al.  Multiresolution Tangent Distance for Affine-invariant Classification , 1997, NIPS.

[27]  Holger Schwenk,et al.  The Diabolo Classifier , 1998, Neural Computation.

[28]  Patrice Y. Simard,et al.  Metrics and Models for Handwritten Character Recognition , 1998 .

[29]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[30]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[31]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .