Image retrieval by content: a machine learning approach

In areas as diverse as Earth remote sensing, astronomy, and medical imaging, there has been an explosive growth in the amount of image data available for creating digital image libraries. However, the lack of automated analysis and useful retrieval methods stands in the way of creating true digital image libraries. In order to perform query-by-content type searches, the query formulation problem needs to be addressed: it is often not possible for users to formulate the targets of their searches in terms of queries. We present a natural and powerful approach to this problem to assist scientists in exploring large digital image libraries. We target a system that the user trains to find certain patterns by providing it with examples. The learning algorithms use the training data to produce classifiers to detect and identify other targets in the large image collection. This forms the basis for query by content capabilities and for library indexing purposes. We ground the discussion by presenting two such applications at JPL: the SKICAT system used for the reduction and analysis of a 3 terabyte astronomical data set, and the JARtool system to be used in automatically analyzing the Magellan data set consisting of over 30,000 images of the surface of Venus. General issues which impact the application of learning algorithms to image analysis applications are discussed.