Strong Universal Consistency of Neural Network Classifiers

In statistical pattern recognition a classifier is called universally consistent if its error probability converges to the Bayes-risk as the size of the training da ta grows, for all possible distributions of the random variable pair of the observation vector and i ts class. We prove tha t if a one layered neural network is trained to minimize the empirical risk on the training data, then it results in a universally consistent classifier if the number of nodes L is chosen such that k + 00 and k l o g ( n ) / n 4 0 as the size of the training da ta n grows t o infinity. We show that if certain smoothness conditions on the distribution are satisfied, then by choosing k = O ( d m , the exponent in the rate of convergence does not depend on the dimension.