Everything Gets Better All the Time, Apart from the Amount of Data

The paper first addresses the main issues in current content-based image retrieval to conclude that the largest factors of innovations are found in the large size of the datasets, the ability to segment an image softly, the interactive specification of the user’s wish, the sharpness and invariant capabilities of features, and the machine learning of concepts. Among these everything gets better every year apart from the need for annotation which gets worse with every increase in the dataset size. Therefore, we direct our attention to the question what fraction of images needs to be labeled to get an almost similar result compared to the case when all images would have been labeled by annotation? And, how can we design an interactive annotation scheme where we put up for annotation those images which are most informative in the definition of the concept (boundaries)? It appears that we have developed an random followed by a sequential annotation scheme which requires annotating 1% equal to 25 items in a dataset of 2500 faces and non-faces to yield an almost identical boundary of the face-concept compared to the situation where all images would have been labeled. This approach for this dataset has reduced the effort of annotation by 99%.

[1]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[3]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[4]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[5]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[6]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[7]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[8]  Marcel Worring,et al.  Interaction in Content-Based Image Retrieval: The Evaluation of the State-of-the-Art Review , 2000, VISUAL.

[9]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[10]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[11]  Miley W. Merkhofer,et al.  An Evaluation of the State of the Art , 1993 .

[12]  Marcel Worring,et al.  Face detection by aggregated Bayesian network classifiers , 2001, Pattern Recognit. Lett..

[13]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Arnold W. M. Smeulders,et al.  PicToSeek: combining color and shape invariant features for image retrieval , 2000, IEEE Trans. Image Process..

[15]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[16]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[18]  Andrew Zisserman,et al.  Geometric invariance in computer vision , 1992 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[21]  Clement H. C. Leung,et al.  Advances in Visual Information Systems, 9th International Conference, VISUAL 2007, Shanghai, China, June 28-29, 2007 Revised Selected Papers , 2007, VISUAL.

[22]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Luc Van Gool,et al.  Noncombinatorial Detection of Regular Repetitions under Perspective Skew , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .