The average person with a networked computer can now understand why computers should have vision { to search the world's collections of digital video and images and \retrieve a picture of ." Computer vision for intelligent browsing, querying, and retrieval of imagery is needed now, and yet traditional approaches to computer vision remain far from a general solution to the scene understanding problem. In this paper I discuss the need for a solution based on combining high-level and low-level vision, that works in concert with input from a human user. The solution is based on: 1) Learning from the user what is important visually, and 2) Learning associations between text descriptions and visual data. I describe some recent results in these areas, and overview key challenges for future research in computer vision for digital libraries.
[1]
Anil S. Chakravarthy,et al.
Toward Semantic Retrieval of Pictures and Video
,
1994,
RIAO.
[2]
Sean A. Spence,et al.
Descartes' Error: Emotion, Reason and the Human Brain
,
1995
.
[3]
Rohini K. Srihari.
Combining text and image information in content-based retrieval
,
1995,
Proceedings., International Conference on Image Processing.
[4]
Rosalind W. Picard.
Light-years from Lena: video and image libraries of the future
,
1995,
Proceedings., International Conference on Image Processing.
[5]
Rosalind W. Picard,et al.
Interactive Learning Using a "Society of Models"
,
2017,
CVPR 1996.