Efficient agnostic PAC-learning with simple hypothesis

We exhibit efficient algorithms for agnostic PAC-learning with rectangles, unions of two rectangles, and unions of k intervals as hypotheses. These hypothesis classes are of some interest from the point of view of applied machine learning, because empirical studies show that hypotheses of this simple type (in just one or two of the attributes) provide good prediction rules for various real-world classification problems. In addition, optimal hypotheses of this type may provide valuable heuristic insight into the structure of a real world classification problem. The algorithms that are introduced in this paper make it feasible to compute optimal hypotheses of this type for a training set of several hundred examples. We also exhibit an approximation algorithm that can compute nearly optimal hypotheses for much larger datasets.

[1]  Dimitrios Gunopulos,et al.  Computing the rectangle discrepancy , 1994, SCG '94.

[2]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[3]  Sholom M. Weiss,et al.  Maximizing the Predictive Value of Production Rules , 1990, Artif. Intell..

[4]  Robert E. Schapire,et al.  Efficient Distribution-free Learning of Probabilistic Concepts (Extended Abstract) , 1990, FOCS 1990.

[5]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[6]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry , 2012, EATCS Monographs on Theoretical Computer Science.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[12]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[13]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[14]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[15]  Scott E. Decatur Statistical queries and faulty PAC oracles , 1993, COLT '93.

[16]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[17]  Kurt Mehlhorn,et al.  Multi-dimensional searching and computational geometry , 1984 .

[18]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.