Hints

The systematic use of hints in the learning-from-examples paradigm is the subject of this review. Hints are the properties of the target function that are known to us independently of the training examples. The use of hints is tantamount to combining rules and data in learning, and is compatible with different learning models, optimization techniques, and regularization techniques. The hints are represented to the learning process by virtual examples, and the training examples of the target function are treated on equal footing with the rest of the hints. A balance is achieved between the information provided by the different hints through the choice of objective functions and learning schedules. The Adaptive Minimization algorithm achieves this balance by relating the performance on each hint to the overall performance. The application of hints in forecasting the very noisy foreign-exchange markets is illustrated. On the theoretical side, the information value of hints is contrasted to the complexity value and related to the VC dimension.

[1]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[2]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[3]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[4]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[8]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[9]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[10]  Yaser S. Abu-Mostafa,et al.  The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning , 1989, Neural Computation.

[11]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[12]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[13]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[14]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[15]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[16]  Geoffrey E. Hinton,et al.  Adaptive Elastic Models for Hand-Printed Character Recognition , 1991, NIPS.

[17]  John E. Moody,et al.  Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction , 1991, NIPS.

[18]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[21]  Steven C. Suddarth,et al.  Symbolic-Neural Systems and the Use of Hints for Developing Complex Systems , 1991, Int. J. Man Mach. Stud..

[22]  D. Rumelhart,et al.  Generalization through Minimal Networks with Application to Forecasting , 1992 .

[23]  T. Poggio,et al.  Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries , 1992 .

[24]  William John Andrew Fyfe Invariance hints and the VC dimension , 1992 .

[25]  Yaser S. Abu-Mostafa,et al.  A Method for Learning From Hints , 1992, NIPS.

[26]  C. Lee Giles,et al.  Training Second-Order Recurrent Neural Networks using Hints , 1992, ML.

[27]  Pablo Moscato,et al.  An introduction to population approaches for optimization and hierarchical objective functions: A discussion on the role of tabu search , 1993, Ann. Oper. Res..

[28]  Yaser S. Abu-Mostafa,et al.  Hints and the VC Dimension , 1993, Neural Computation.

[29]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[30]  Todd K. Leen,et al.  From Data Distributions to Regularization in Invariant Learning , 1995, Neural Computation.

[31]  Charles M. Grinstead,et al.  Introduction to probability , 1986, Statistics for the Behavioural Sciences.