Image and Video Retrieval

We have witnessed a decade of exploding research interest in multimedia content analysis. The goal of content analysis has been to derive automatic methods for high-level description and annotation. In this paper we will summarize the main research topics in this area and state some assumptions that we have been using all along. We will also postulate the main future trends including usage of long term memory, context, dynamic processing, evolvable generalized detectors and user aspects.

[1]  J. McDermott,et al.  Rule-Based Interpretation of Aerial Imagery , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[4]  Osamu Nakamura,et al.  Human-face extraction using modified HSV color system and personal identification through facial image based on isodensity maps , 1995, Proceedings 1995 Canadian Conference on Electrical and Computer Engineering.

[5]  Timothy F. Cootes,et al.  A unified approach to coding and interpreting face images , 1995, Proceedings of IEEE International Conference on Computer Vision.

[6]  Alan F. Smeaton,et al.  Experiments on using semantic distances between words in image caption retrieval , 1996, SIGIR '96.

[7]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[8]  Yoshitomo Yaginuma,et al.  Content-based drama editing based on inter-media synchronization , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[9]  Yasuo Ariki,et al.  Extraction of TV news articles based on scene cut detection using DCT clustering , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[10]  Garrison W. Cottrell,et al.  Representing Face Images for Emotion Classification , 1996, NIPS.

[11]  Jun Ohya,et al.  Recognizing multiple persons' facial expressions using HMM based on automatic extraction of significant frames from image sequences , 1997, Proceedings of International Conference on Image Processing.

[12]  N. Shiotani,et al.  Image retrieval system using an iconic thesaurus , 1997, 1997 IEEE International Conference on Intelligent Processing Systems (Cat. No.97TH8335).

[13]  Ichiro Ide,et al.  Automatic Video Indexing Based on Shot Classification , 1998, AMCP.

[14]  Frann Cois Denis,et al.  PAC Learning from Positive Statistical Queries , 1998, ALT.

[15]  Thomas S. Huang,et al.  Connected vibrations: a modal analysis approach for non-rigid motion tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[16]  Robert Tansley,et al.  The Multimedia Thesaurus: An Aid for Multimedia Information Retrieval and Navigation , 1998 .

[17]  Makoto Nagao,et al.  Aligning Articles in TV Newscasts and Newspapers , 1998, COLING-ACL.

[18]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[19]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Alex Pentland,et al.  LAFTER: a real-time face and lips tracker with facial expression recognition , 2000, Pattern Recognit..

[21]  Howard D. Wactlar,et al.  Complementary video and audio analysis for broadcast news archives , 2000, CACM.

[22]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[23]  Stephen Armstrong,et al.  The what, who, where, when, why and how of context-awareness , 2000, CHI Extended Abstracts.

[24]  Thomas S. Huang,et al.  Small sample learning during multimedia retrieval using BiasMap , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  HongJiang Zhang,et al.  Thesaurus-aided approach for image browsing and retrieval , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[26]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[27]  Deb K. Roy,et al.  Learning visually grounded words and syntax for a scene description task , 2002, Comput. Speech Lang..

[28]  Vipin Kumar,et al.  Predicting rare classes: can boosting make any weak learner strong? , 2002, KDD.

[29]  Neil C. Rowe Marie-4: A High-Recall, Self-Improving Web Crawler That Finds Images Using Captions , 2002, IEEE Intell. Syst..

[30]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[31]  John R. Smith,et al.  Exploring semantic dependencies for scalable concept detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[32]  John R. Smith,et al.  Normalized classifier fusion for semantic visual concept detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[33]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.