Automatic Pattern Recognition: A Study of the Probability of Error

A test sequence is used to select the best rule from a class of discrimination rules defined in terms of the training sequence. The Vapnik-Chervonenkis and related inequalities are used to obtain distribution-free bounds on the difference between the probability of error of the selected rule and the probability of error of the best rule in the given class. The bounds are used to prove the consistency and asymptotic optimality for several popular classes, including linear discriminators, nearest-neighbor rules, kernel-based rules, histogram rules, binary tree classifiers, and Fourier series classifiers. In particular, the method can be used to choose the smoothing parameter in kernel-based rules, to choose k in the k-nearest neighbor rule, and to choose between parametric and nonparametric rules. >

[1]  R. D. Gordon Values of Mills' Ratio of Area to Bounding Ordinate and of the Normal Probability Integral for Large Values of the Argument , 1941 .

[2]  D. Stoller Univariate Two-Population Distribution-Free Discrimination , 1954 .

[3]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[4]  P. Mahalanobis A Method of Fractile Graphical Analysis , 1960 .

[5]  J. Tukey Curves As Parameters, and Touch Estimation , 1961 .

[6]  George S. Sebestyen,et al.  Decision-making processes in pattern recognition , 1962 .

[7]  George S Sebestyen,et al.  Decision-making processes in pattern recognition (ACM monograph series) , 1962 .

[8]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[9]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[10]  E. Nadaraya On Estimating Regression , 1964 .

[11]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[12]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[13]  E. Nadaraya On Non-Parametric Estimates of Density Functions and Regression Curves , 1965 .

[14]  T. Cacoullos Estimation of a multivariate density , 1966 .

[15]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[16]  S. Schwartz Estimation of Probability Density by an Orthogonal Series , 1967 .

[17]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[18]  P. Lachenbruch An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. , 1967, Biometrics.

[19]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[20]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[21]  R. Kronmal,et al.  The Estimation of Probability Densities and Cumulatives by Fourier Series Methods , 1968 .

[22]  M. Gessaman,et al.  Nonparametric Discrimination Using Tolerance Regions , 1968 .

[23]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[24]  T. Cover LEARNING IN PATTERN RECOGNITION , 1969 .

[25]  King-Sun Fu,et al.  A Nonparametric Partitioning Procedure for Pattern Classification , 1969, IEEE Transactions on Computers.

[26]  W. S. Meisel,et al.  Potential Functions in Mathematical Pattern Recognition , 1969, IEEE Transactions on Computers.

[27]  E. Nadaraya Remarks on Non-Parametric Estimates for Density Functions and Regression Curves , 1970 .

[28]  R. Kronmal,et al.  On Multivariate Density Estimates Based on Orthogonal Expansions , 1970 .

[29]  Marshall W. Anderson,et al.  A distribution-free discrimination procedure based on clustering , 1970, IEEE Trans. Inf. Theory.

[30]  M. Gessaman A Consistent Nonparametric Multivariate Density Estimator Based on Statistically Equivalent Blocks , 1970 .

[31]  D. Specht Series Estimation of a Probability Density Function , 1971 .

[32]  Terry J. Wagner,et al.  Convergence of the nearest neighbor rule , 1970, IEEE Trans. Inf. Theory.

[33]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[34]  N. Glick Sample-Based Classification Procedures Derived from Density Estimators , 1972 .

[35]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[36]  P. Révész On empirical density function , 1972 .

[37]  M. Gessaman,et al.  A Comparison of Some Multivariate Discrimination Procedures , 1972 .

[38]  GUY W. BEAKLEY,et al.  Distribution-Free Pattern Verification Using Statistically Equivalent Blocks , 1972, IEEE Transactions on Computers.

[39]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[40]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[41]  William S. Meisel,et al.  A Partitioning Algorithm with Application in Pattern Classification and the Optimization of Decision Trees , 1973, IEEE Transactions on Computers.

[42]  Terry J. Wagner Convergence of the edited nearest neighbor (Corresp.) , 1973, IEEE Trans. Inf. Theory.

[43]  Peter A. Lachenbruch,et al.  Robustness of the linear and quadratic discriminant function to certain types of non‐normality , 1973 .

[44]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[45]  H. Volland Comments on a Paper by C. O. Hines, ‘A critique of multilayer analyses in application to the propagation of acoustic‐gravity waves’ , 1973 .

[46]  N. Glick Sample-Based Multinomial Classification , 1973 .

[47]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[48]  J. T. Chu Some New Error Bounds and Approximations for Pattern Recognition , 1974, IEEE Transactions on Computers.

[49]  Laveen N. Kanal,et al.  Patterns in pattern recognition: 1968-1974 , 1974, IEEE Trans. Inf. Theory.

[50]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[51]  Julian R. Ullmann,et al.  Automatic selection of reference data for use in a nearest-neighbor method of pattern classification (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[52]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[53]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[54]  King-Sun Fu,et al.  Error estimation in pattern recognition via LAlpha -distance between posterior density functions , 1976, IEEE Trans. Inf. Theory.

[55]  L. Bartolucci,et al.  Selective Radiant Temperature Mapping Using a Layered Classifier , 1977, IEEE Transactions on Geoscience Electronics.

[56]  Ned Glick,et al.  Sample-based classification procedures related to empiric distributions , 1976, IEEE Trans. Inf. Theory.

[57]  L. Devroye Nonparametric Discrimination and Density Estimation. , 1976 .

[58]  G. McLachlan Bias of Apparent Error Rate in Discriminant-Analysis , 1976 .

[59]  Ishwar K. Sethi,et al.  Efficient decision tree design for discrete variable pattern recognition problems , 1977, Pattern Recognition.

[60]  Philip H. Swain,et al.  Purdue e-Pubs , 2022 .

[61]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[62]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[63]  William S. Meisel,et al.  An Algorithm for Constructing Optimal Binary Decision Trees , 1977, IEEE Transactions on Computers.

[64]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[65]  Ned Glick,et al.  Additive estimators for probabilities of correct classification , 1978, Pattern Recognit..

[66]  J. Taylor,et al.  Automated hierarchic decision structures for multiple category cell classification by TICAS. , 1978, Acta cytologica.

[67]  R. Olshen,et al.  Asymptotically Efficient Solutions to the Classification Problem , 1978 .

[68]  Wlodzimierz Greblicki,et al.  Asymptotically optimal pattern recognition procedures with density estimates (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[69]  Ashok V. Kulkarni On the Mean Accuracy of Hierarchical Classifiers , 1978, IEEE Transactions on Computers.

[70]  Judea Pearl,et al.  Capacity and Error Estimates for Boolean Classifiers with Limited Complexity , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Luc Devroye,et al.  Distribution-free performance bounds with the resubstitution error estimate (Corresp.) , 1979, IEEE Trans. Inf. Theory.

[72]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[73]  P. Gaenssler,et al.  Empirical Processes: A Survey of Results for Independent and Identically Distributed Random Variables , 1979 .

[74]  Luc Devroye,et al.  Distribution-free inequalities for the deleted and holdout error estimates , 1979, IEEE Trans. Inf. Theory.

[75]  L. Devroye,et al.  Distribution-Free Consistency Results in Nonparametric Discrimination and Regression Function Estimation , 1980 .

[76]  E. M. Rounds A combined nonparametric approach to feature selection and binary decision tree design , 1980, Pattern Recognit..

[77]  C. Spiegelman,et al.  Consistent Window Estimation in Nonparametric Regression , 1980 .

[78]  King-Sun Fu,et al.  Automated classification of nucleated blood cells using a binary tree classifier , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  R. Olshen,et al.  Consistent nonparametric regression from recursive partitioning schemes , 1980 .

[80]  L. Devroye,et al.  On the L1 convergence of kernel estimators of regression functions with applications in discrimination , 1980 .

[81]  Thomas M. Cover,et al.  Topics in Statistical Pattern Recognition , 1980 .

[82]  Wlodzimierz Greblicki Asymptotic efficiency of classifying procedures using the Hermite series estimate of multivariate probability densities , 1981, IEEE Trans. Inf. Theory.

[83]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[84]  V. Vapnik,et al.  Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations , 1982 .

[85]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[86]  I. K. Sethi,et al.  Hierarchical Classifier Design Using Mutual Information , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  L. Devroye Bounds for the Uniform Deviation of Empirical Measures , 1982 .

[88]  Miroslaw Pawlak,et al.  A classification procedure using the multiple Fourier series , 1982, Inf. Sci..

[89]  Roland T. Chin,et al.  An Automated Approach to the Design of Decision Tree Classifiers , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Roland T. Chin,et al.  An Automated Approach to the Design of Decision Tree Classifiers , 1982 .

[91]  Saul Brian Gelfand A nonparametric multiclass partitioning method for classification , 1982 .

[92]  King-Sun Fu,et al.  A method for the design of binary tree classifiers , 1983, Pattern Recognit..

[93]  Miroslaw Pawlak,et al.  Almost sure convergence of classification procedures using Hermite series density estimates , 1983, Pattern Recognit. Lett..

[94]  King-Sun Fu,et al.  Automatic classification of cervical cells using a binary tree classifier , 1983, Pattern Recognition.

[95]  Marek W. Kurzysnki The optimal strategy of a tree classifier , 1983, Pattern Recognit..

[96]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[97]  K. Alexander,et al.  Probability Inequalities for Empirical Processes and a Law of the Iterated Logarithm , 1984 .

[98]  Keinosuke Fukunaga,et al.  An Optimal Global Nearest Neighbor Metric , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[99]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[100]  A. Krzyżak,et al.  Distribution-Free Pointwise Consistency of Kernel Regression Estimate , 1984 .

[101]  George Nagy,et al.  Decision tree design using a probabilistic model , 1984, IEEE Trans. Inf. Theory.

[102]  J. Yukich Laws of large numbers for classes of functions , 1985 .

[103]  W. Greblicki,et al.  Pointwise consistency of the hermite series density estimate , 1985 .

[104]  W. Härdle,et al.  Optimal Bandwidth Selection in Nonparametric Regression Function Estimation , 1985 .

[105]  Adam Krzyzak,et al.  The rates of convergence of kernel regression estimates and classification rules , 1986, IEEE Trans. Inf. Theory.

[106]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .