Mining from large image sets (Keynote address)

So far, most image mining was based on interactive querying. Although such querying will remain important in the future, several applications need image mining at such wide scales that it has to run automatically. This adds an additional level to the problem, namely to apply appropriate further processing to different types of images, and to decide on such processing automatically as well. This paper touches on those issues in that we discuss the processing of landmark images and of images coming from webcams. The first part deals with the automated collection of images of landmarks, which are then also automatically annotated and enriched with Wikipedia information. The target application is that users photograph landmarks with their mobile phones or PDAs, and automatically get information about them. Similarly, users can get images in their photo albums annotated automatically. The object of interest can also be automatically delineated in the images. The pipeline we propose actually retrieves more images than manual keyword input would produce. The second part of the paper deals with an entirely different source of image data, but one that also produces massive amounts (although typically not archived): webcams. They produce images at a single location, but rather continuously and over extended periods of time. We propose an approach to summarize data coming from webcams. This data handling is quite different from that applied to the landmark images.

[1]  Maarten Vergauwen,et al.  Web-based 3D Reconstruction Service , 2006, Machine Vision and Applications.

[2]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[6]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[9]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Ales Leonardis,et al.  High-Dimensional Feature Matching: Employing the Concept of Meaningful Nearest Neighbors , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Yair Weiss,et al.  Deriving intrinsic images from image sequences , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  Robert Pless,et al.  Consistent Temporal Variations in Many Outdoor Scenes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[15]  Yael Pritch,et al.  Webcam Synopsis: Peeking Around the World , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Andrew Zisserman,et al.  “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[20]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[21]  Tieniu Tan,et al.  A system for learning statistical motion patterns , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Michael Werman,et al.  An On-Line Agglomerative Clustering Method for Nonstationary Data , 1999, Neural Computation.

[23]  Alexei A. Efros,et al.  What Does the Sky Tell Us about the Camera? , 2008, ECCV.

[24]  Wojciech Matusik,et al.  What do color changes reveal about an outdoor scene? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Luc Van Gool,et al.  I know what you did last summer: object-level auto-annotation of holiday snaps , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Shaogang Gong,et al.  Scene Segmentation for Behaviour Correlation , 2008, ECCV.

[27]  W. Eric L. Grimson,et al.  Trajectory analysis and semantic region modeling using a nonparametric Bayesian model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  David C. Hogg,et al.  Learning the Distribution of Object Trajectories for Event Recognition , 1995, BMVC.

[29]  Luc Van Gool,et al.  Hunting Nessie - Real-time abnormality detection from webcams , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.