Improving generalization with active learning

Active learning differs from “learning from examples” in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples.In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning calledselective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers “useful.” We test our implementation, called anSG-network, on three domains and observe significant improvement in generalization.

[1]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[5]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[6]  P. Kuhl,et al.  Acoustic determinants of infant preference for motherese speech , 1987 .

[7]  J. Stephen Judd,et al.  On the complexity of loading shallow neural networks , 1988, J. Complex..

[8]  R.J. Marks,et al.  Artificial neural networks for power system static security assessment , 1989, IEEE International Symposium on Circuits and Systems,.

[9]  T. Ash,et al.  Dynamic node creation in backpropagation networks , 1989, International 1989 Joint Conference on Neural Networks.

[10]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[11]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[12]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[13]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[14]  Jenq-Neng Hwang,et al.  Query learning based on boundary search and gradient computation of trained multilayer perceptrons , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[15]  Eric B. Baum,et al.  Constructing Hidden Units Using Examples and Queries , 1990, NIPS.

[16]  Ronald L. Rivest,et al.  On the sample complexity of pac-learning using random and chosen examples , 1990, Annual Conference Computational Learning Theory.

[17]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[18]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[19]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[20]  H. Sebastian Seung,et al.  Information, Prediction, and Query by Committee , 1992, NIPS.

[21]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[22]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[23]  Gerald Tesauro,et al.  How Tight Are the Vapnik-Chervonenkis Bounds? , 1992, Neural Computation.

[24]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[25]  David A. Cohn,et al.  Improving Generalization with Active Learning , 1994 .

[26]  David Haussler,et al.  Learning Conjunctive Concepts in Structural Domains , 1989, Machine Learning.