What is there in a training sample?

Two factors that are known to have direct influence on the classification accuracy of any neural network are (1) the network complexity and (2) the representational accuracy of the training data. While pruning algorithms are used to tackle the complexity problem, no direct solutions are known for the second. Selecting training data at random from the sample space is the most popular method followed. Despite its simplicity, this method does not ensure nor guarantee that the training would be optimal. In this brief paper, we present a new method that is specific to a difference boosting neural network (DBNN) but could probably be extended to other networks as well. The method is iterative and fast, ensuring optimal selection of the minimum training data from a larger set in an automated manner. We test the performance of the new method on the some of the well known datasets from the UCI repository for benchmarking machine learning tools and show that the performance of the new method in almost all cases is better than that in any published method of comparable network complexity and that it requires only a fraction of the usual training data, thereby, making learning faster and more generic.