Ground truth and benchmarks for performance evaluation

Progress in algorithm development and transfer of results to practical applications such as military robotics requires the setup of standard tasks, of standard qualitative and quantitative measurements for performance evaluation and validation. Although the evaluation and validation of algorithms have been discussed for over a decade, the research community still faces a lack of well-defined and standardized methodology. The range of fundamental problems include a lack of quantifiable measures of performance, a lack of data from state-of-the-art sensors in calibrated real-world environments, and a lack of facilities for conducting realistic experiments. In this research, we propose three methods for creating ground truth databases and benchmarks using multiple sensors. The databases and benchmarks will provide researchers with high quality data from suites of sensors operating in complex environments representing real problems of great relevance to the development of autonomous driving systems. At NIST, we have prototyped a High Mobility Multi-purpose Wheeled Vehicle (HMMWV) system with a suite of sensors including a Riegl ladar, GDRS ladar, stereo CCD, several color cameras, Global Position System (GPS), Inertial Navigation System (INS), pan/tilt encoders, and odometry . All sensors are calibrated with respect to each other in space and time. This allows a database of features and terrain elevation to be built. Ground truth for each sensor can then be extracted from the database. The main goal of this research is to provide ground truth databases for researchers and engineers to evaluate algorithms for effectiveness, efficiency, reliability, and robustness, thus advancing the development of algorithms.

[1]  Robert M. Haralick,et al.  Propagating covariance in computer vision , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[2]  Mark W. Powell,et al.  Automated performance evaluation of range image segmentation , 2000, WACV.

[3]  Heikki Kälviäinen,et al.  Randomized or probabilistic Hough transform: unified performance evaluation , 2000, Pattern Recognit. Lett..

[4]  Djemel Ziou,et al.  Contextual and non-contextual performance evaluation of edge detectors , 2000, Pattern Recognit. Lett..

[5]  Christian Heipke,et al.  EMPIRICAL EVALUATION OF AUTOMATICALLY EXTRACTED ROAD AXES , 1998 .

[6]  Azriel Rosenfeld,et al.  Performance analysis of a simple vehicle detection algorithm , 2002, Image Vis. Comput..

[7]  Tommy Chang,et al.  Repository of sensor data for autonomous driving research , 2003, SPIE Defense + Commercial Sensing.

[8]  Karl Murphy,et al.  Performance evaluation of UGV obstacle detection with CCD/FLIR stereo vision and LADAR , 1998, Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell.

[9]  SuChin Chen Coutre,et al.  Performance evaluation of image registration , 2000, Proceedings of the 22nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (Cat. No.00CH37143).

[10]  Jefferey A. Shufelt,et al.  Performance Evaluation and Analysis of Monocular Building Extraction From Aerial Imagery , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Sean Dougherty,et al.  Edge Detector Evaluation Using Empirical ROC Curves , 2001, Comput. Vis. Image Underst..

[12]  Peter Meer,et al.  Performance Assessment Through Bootstrap , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Neil A. Thacker,et al.  Algorithmic modelling for performance evaluation , 1997, Machine Vision and Applications.

[14]  Hyeonjoon Moon,et al.  The FERET Evaluation Methodology for Face-Recognition Algorithms , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Kuo-Chan Huang,et al.  Benchmarking and performance evaluation of NCHC PC cluster , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[16]  Edwige Pissaloux,et al.  Toward an image segmentation benchmark for evaluation of vision systems , 2001, J. Electronic Imaging.

[17]  Dmitry B. Goldgof,et al.  An Objective Comparison Methodology of Edge Detection Algorithms Using a Structure from Motion Task , 1998, CVPR.