Toward a Theory of Learning Coherent Concepts

We develop a theory for learning scenarios where multiple learners co-exist but there are mutual compatibility constraints on their outcomes. This is natural in cognitive learning situations, where \natural" compatibility constraints are imposed on the outcomes of classiers so that a valid sentence, image or any other domain representation is produced. We suggest that work in this direction may help to resolve the contrast between the hardness of learning as predicted by the current theoretical models and the apparent ease at which cognitive systems seem to learn. A model of concept learning is studied in which the target concept is required to cohere with other concepts of interest. The coherency is expressed via a (Boolean) constraint that the concepts have to satisfy. Under this model, learning a concept is shown to be easier (in terms of sample complexity and mistake bounds) and the concepts learned are shown to be more robust to noise in their input (attribute noise). These properties are established for half spaces and the connection to large margin theory is discussed.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  Rosario Gennaro,et al.  On learning from noisy and incomplete examples , 1995, COLT '95.

[3]  Leslie G. Valiant,et al.  Robust logics , 1999, Symposium on the Theory of Computing.

[4]  Leslie G. Valiant,et al.  Robust logics , 1999, STOC '99.

[5]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[6]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[7]  Dan Roth,et al.  Part of Speech Tagging Using a Network of Linear Separators , 1998, ACL.

[8]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[9]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[10]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[11]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Harris Drucker,et al.  Improving Performance in Neural Networks Using a Boosting Algorithm , 1992, NIPS.

[14]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[15]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[16]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[17]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[18]  Dan Roth,et al.  A Learning Approach to Shallow Parsing , 1999, EMNLP.

[19]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[20]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[21]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[22]  Dan Roth,et al.  Learning in Natural Language , 1999, IJCAI.

[23]  Leslie G. Valiant,et al.  Relational Learning for NLP using Linear Threshold Elements , 1999, IJCAI.