Salient features, combined detectors and image flipping: an approach to Haar cascades for recognising horses and other complex, deformable objects

The author describes a new 'shortcut' approach to automatically detecting horses in still images and video: salient features, combining and flipping. Horses are complex, deformable (non-rigid) target objects with high levels of intra-class shape variability. A prototype Haar cascade detector was trained to detect what the author calls a 'salient feature'. This a distinctive, minimally changing physical attribute that is easily recognisable from multiple viewpoints. The detector's target object is: 'horse ears' and it only required a total training time of 91 minutes. It was evaluated in combination with an existing, 'asymmetric' detector (trained only to recognise right-facing horses). By combining the existing horse detector with the author's salient feature ears detector, the hit rate for true positives was increased by 50% (relative to the existing detector's performance). Flipping each test image (or video frame) around its vertical axis increased the hit rate by 83% (relative to the unflipped results) for the existing, asymmetric detector, when tested on an image dataset of horses facing in both directions.