Robust position, scale, and rotation invariant object recognition using higher-order neural networks

Abstract For object recognition invariant to changes in the object's position, size, and in-plane rotation, higher-order neural networks (HONNs) have numerous advantages over other neural network approaches. Because distortion invariance can be built into the architecture of the network, HONNs need to be trained on just one view of each object, not numerous distorted views, reducing the training time significantly. Further, 100% accuracy can be guaranteed for noise-free test images characterized by the built-in distortions. Specifically, a third-order neural network trained on just one view of an SR-71 aircraft and a U-2 aircraft in a 127 × 127 pixel input field successfully recognized all views of both aircraft larger than 70% of the original size, regardless of orientation or position of the test image. Training required just six passes. In contrast, other neural network approaches require thousands of passes through a training set consisting of a much larger number of training images and typically achieve only 80–90% accuracy on novel views of the objects. The above results assume a noise-free environment. The performance of HONNs is explored with non-ideal test images characterized by white Gaussian noise or partial occlusion. With white noise added to images with an ideal separation of background vs. foreground gray levels, it is shown that HONNs achieve 100% recognition accuracy for the test set for a standard deviation up to ∼10% of the maximum gray value and continue to show good performance (defined as better than 75% accuracy) up to a standard deviation of ∼14%. HONNs are also robust with respect to partial occlusion. For the test set of training images with very similar profiles, HONNs achieve 100% recognition accuracy for one occlusion of ∼13% of the input field size and four occlusions of ∼70% of the input field size. They show good performance for one occlusion of ∼23% of the input field size or four occlusions of ∼15% of the input field size each. For training images with very different profiles, HONNs achieve 100% recognition accuracy for the test set for up to four occlusions of ∼2% of the input field size and continue to show good performance for up to four occlusions of ∼23% of the input field size each.