Statistical media analysis in video indexing

Media analysis for video indexing is witnessing an increasing influence of statistical techniques. Examples of these techniques include the use of generative models as well as discriminant techniques for video structuring, classification, summarization, indexing, and retrieval. Advances in multimedia analysis are related directly to advances in signal processing, computer vision, pattern recognition, multimedia databases, and smart sensors. This paper highlights the statistical techniques in multimedia retrieval with particular emphasis on semantic characterization.

[1]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[2]  Milind R. Naphade,et al.  Novel scheme for fast and efficent video sequence matching using compact signatures , 1999, Electronic Imaging.

[3]  Zhu Liu,et al.  Classification TV programs based on audio information using hidden Markov model , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[4]  John R. Smith,et al.  Learning to annotate video databases , 2001, IS&T/SPIE Electronic Imaging.

[5]  Alex Pentland,et al.  Unsupervised clustering of ambulatory audio and video , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[7]  Milind R. Naphade,et al.  Classifying motion picture soundtrack for video indexing , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[8]  Takeo Kanade,et al.  Semantic analysis for video contents extraction—spotting by association in news video , 1997, MULTIMEDIA '97.

[9]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[10]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[11]  Alexander G. Hauptmann,et al.  Learning to Recognize Speech by Watching Television , 1999, IEEE Intell. Syst..

[12]  Wayne H. Wolf,et al.  Hidden Markov model parsing of video programs , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  James M. Rehg,et al.  Vision-based speaker detection using Bayesian networks , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[14]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[15]  Hayit Greenspan,et al.  A Continuous Probabilistic Framework for Image Matching , 2001, Comput. Vis. Image Underst..

[16]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[18]  Milind R. Naphade,et al.  Discovering recurrent events in video using unsupervised methods , 2002, Proceedings. International Conference on Image Processing.

[19]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[20]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[21]  Tsuhan Chen,et al.  Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[22]  M. Ibrahim Sezan,et al.  A computational approach to semantic event detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[23]  Milind R. Naphade,et al.  Probabilistic Semantic Video Indexing , 2000, NIPS.

[24]  David S. Doermann,et al.  Identifying sports videos using replay, text, and camera motion features , 1999, Electronic Imaging.

[25]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[26]  Tsuhan Chen,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..

[27]  Shih-Fu Chang,et al.  Spatio-temporal video search using the object based video representation , 1997, Proceedings of International Conference on Image Processing.

[28]  A. Murat Tekalp,et al.  Probabilistic Analysis and Extraction of Video Content , 1999, ICIP.

[29]  Milind R. Naphade,et al.  Inferring semantic concepts for video indexing and retrieval , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[30]  Takeo Kanade,et al.  elligent Access Video: formedia Project , 1996 .

[31]  John S. Boreczky,et al.  Finding presentations in recorded meetings using audio and video features , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[32]  Milind R. Naphade,et al.  Supporting audiovisual query using dynamic programming , 2001, MULTIMEDIA '01.

[33]  Jonathan D. Courtney Automatic video indexing via object motion analysis , 1997, Pattern Recognit..

[34]  T.S. Huang,et al.  Recognizing high-level audio-visual concepts using context , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[35]  Shih-Fu Chang,et al.  Video scene segmentation using video and audio features , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[36]  Thomas S. Huang,et al.  A probablistic framework for mapping audio-visual features to high-level semantics in terms of concepts and context , 2001 .

[37]  Thomas S. Huang,et al.  Image classification using a set of labeled and unlabeled images , 2000, SPIE Optics East.

[38]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[39]  Milind R. Naphade,et al.  Stochastic modeling of soundtrack for efficient segmentation and indexing of video , 1999, Electronic Imaging.

[40]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[41]  Hiroshi Hamada,et al.  Video Handling with Music and Speech Detection , 1998, IEEE Multim..

[42]  Erling H. Wold,et al.  Content-Based Search, and Retrieval of Audio , 1996 .

[43]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[44]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[45]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[46]  Nuno Vasconcelos,et al.  On the complexity of probabilistic image retrieval , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[47]  Sanjeev R. Kulkarni,et al.  Automated analysis and annotation of basketball video , 1997, Electronic Imaging.

[48]  Thomas S. Huang,et al.  Audio-visual event detection using duration dependent input output Markov models , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[49]  K. Ramchandran,et al.  A factor graph framework for semantic indexing and retrieval in video , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[50]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[51]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[52]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[53]  S. Kullback,et al.  Information Theory and Statistics , 1959 .