Multimodal fusion for multimedia analysis: a survey
暂无分享,去创建一个
Mohan S. Kankanhalli | Pradeep K. Atrey | Abdulmotaleb El-Saddik | M. Anwar Hossain | P. Atrey | M. A. Hossain | M. Kankanhalli | Abdulmotaleb El Saddik
[1] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[2] John Zimmerman,et al. A probabilistic layered framework for integrating multimedia content and context information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3] Chin-Hui Lee,et al. A Multi-Modal Approach to Story Segmentation for News Video , 2003, World Wide Web.
[4] Wojciech Pieczynski,et al. Multisensor image segmentation using Dempster-Shafer fusion in Markov fields context , 2001, IEEE Trans. Geosci. Remote. Sens..
[5] P Kuyper,et al. The cocktail party effect. , 1972, Audiology : official organ of the International Society of Audiology.
[6] Stéphane Lafortune,et al. On an Optimization Problem in Sensor Selection* , 2002, Discret. Event Dyn. Syst..
[7] Rong Yan,et al. Probabilistic models for combining diverse knowledge sources in multimedia retrieval , 2006 .
[8] Aggelos K. Katsaggelos,et al. Audio-Visual Biometrics , 2006, Proceedings of the IEEE.
[9] Richa Singh,et al. DS theory based fingerprint classifier fusion with update rule to minimize training time , 2006, IEICE Electron. Express.
[10] Jean-Luc Schwartz,et al. Models for audiovisual fusion in a noisy-vowel recognition task , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.
[11] Mark J. Buller,et al. Confidence-based data management for personal area sensor networks , 2004, DMSN '04.
[12] Huimin Chen,et al. Tracking of multiple moving speakers with multiple microphone arrays , 2004, IEEE Transactions on Speech and Audio Processing.
[13] Harriet J. Nock,et al. Assessing face and speech consistency for monologue detection in video , 2002, MULTIMEDIA '02.
[14] Petros Maragos,et al. Adaptive multimodal fusion by uncertainty compensation , 2006, INTERSPEECH.
[15] Léon J. M. Rothkrantz,et al. Facial Expression Recognition with Relevance Vector Machines , 2005, 2005 IEEE International Conference on Multimedia and Expo.
[16] G. Jaffré,et al. Audio / Video Fusion : a Preprocessing Step for Multimodal Person Identification , 2022 .
[17] Nebojsa Jojic,et al. A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..
[18] Shih-Fu Chang,et al. Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).
[19] R. Schroeder. LITERATURE SURVEY , 1981 .
[20] Y. Oshman. Optimal sensor selection strategy for discrete-time state estimators , 1994 .
[21] Edward Y. Chang,et al. Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.
[22] Noboru Babaguchi,et al. Personalized abstraction of broadcasted American football video by highlight selection , 2004, IEEE Transactions on Multimedia.
[23] Ning Xiong,et al. Multi-sensor management for information fusion: issues and approaches , 2002, Inf. Fusion.
[24] Norbert Pfleger,et al. FADE-An Integrated Approach to Multimodal Fusion and Discourse Processing , 2005 .
[25] Juergen Luettin,et al. Hierarchical discriminant features for audio-visual LVCSR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[26] Jiri Matas,et al. On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..
[27] Arnaud Doucet,et al. A survey of convergence results on particle filtering methods for practitioners , 2002, IEEE Trans. Signal Process..
[28] Samy Bengio,et al. How do correlation and variance of base-experts affect fusion in biometric authentication tasks? , 2005, IEEE Transactions on Signal Processing.
[29] Jean-Philippe Thiran,et al. The BANCA Database and Evaluation Protocol , 2003, AVBPA.
[30] Mustapha Makkook,et al. A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition , 2007 .
[31] Jake K. Aggarwal,et al. Object tracking in an outdoor environment using fusion of features and cameras , 2006, Image Vis. Comput..
[32] Julian Fiérrez,et al. A Comparative Evaluation of Fusion Strategies for Multimodal Biometric Verification , 2003, AVBPA.
[33] S. Sridharan,et al. Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).
[34] Reynold Cheng,et al. Sensor node selection for execution of continuous probabilistic queries in wireless sensor networks , 2004, VSSN '04.
[35] Mohan S. Kankanhalli,et al. Experience based sampling technique for multimedia analysis , 2003, MULTIMEDIA '03.
[36] Takeo Kanade,et al. Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..
[37] Stéphane Ayache,et al. Classifier Fusion for SVM-Based Multimedia Semantic Indexing , 2007, ECIR.
[38] Mohan S. Kankanhalli,et al. Confidence Building Among Correlated Streams in Multimedia Surveillance Systems , 2007, MMM.
[39] Lizhong Xu,et al. An image recognition method based on multiple BP neural networks fusion , 2004, International Conference on Information Acquisition, 2004. Proceedings..
[40] Changsheng Xu,et al. Using Webcast Text for Semantic Event Detection in Broadcast Sports Video , 2008, IEEE Transactions on Multimedia.
[41] Bir Bhanu,et al. Tracking Humans using Multi-modal Fusion , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.
[42] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[43] Harriet J. Nock,et al. Semantic annotation of multimedia using maximum entropy models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[44] Gian Luca Foresti,et al. A distributed sensor network for video surveillance of outdoor environments , 2002, Proceedings. International Conference on Image Processing.
[45] Sophie M. Wuerger,et al. Continuous audio-visual digit recognition using N-best decision fusion , 2004, Inf. Fusion.
[46] Mohan S. Kankanhalli,et al. Information assimilation framework for event detection in multimedia surveillance systems , 2006, Multimedia Systems.
[47] Edward Y. Chang,et al. Multimodal information fusion for video concept detection , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..
[48] Marcel Worring,et al. Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .
[49] Shih-Fu Chang,et al. News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003 , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[50] Zhi-Hua Zhou. Learning with unlabeled data and its application to image retrieval , 2006 .
[51] Dingxing Wang,et al. Boosting image classification with LDA-based feature combination for digital photograph management , 2005, Pattern Recognit..
[52] Pradeep K. Atrey,et al. Modeling and assessing quality of information in multisensor multimedia monitoring systems , 2011, TOMCCAP.
[53] Esther de Ves,et al. Applying logistic regression to relevance feedback in image retrieval systems , 2007, Pattern Recognit..
[54] Kuldip K. Paliwal,et al. Identity verification using speech and face information , 2004, Digit. Signal Process..
[55] Paul Over,et al. High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .
[56] M. Mehta,et al. MULTIMODAL INPUT FUSION IN HUMAN-COMPUTER INTERACTION On the Example of the NICE Project , 2003 .
[57] Stefan M. Rüger,et al. Information-theoretic semantic multimedia indexing , 2007, CIVR '07.
[58] Carlo S. Regazzoni,et al. From multi-sensor surveillance towards smart interactive spaces , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).
[59] Larry S. Davis,et al. Joint Audio-Visual Tracking Using Particle Filters , 2002, EURASIP J. Adv. Signal Process..
[60] Samy Bengio,et al. Multimodal Authentication Using Asynchronous HMMs , 2003, AVBPA.
[61] Witold Pedrycz,et al. Face recognition: A study in information fusion using fuzzy integral , 2005, Pattern Recognit. Lett..
[62] Sharon L. Oviatt,et al. Taming recognition errors with a multimodal interface , 2000, CACM.
[63] Arun Ross,et al. Score normalization in multimodal biometric systems , 2005, Pattern Recognit..
[64] Aristodemos Pnevmatikakis,et al. Real Time Audio-Visual Person Tracking , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.
[65] Thanassis Rikakis,et al. Computational models for experiences in the arts, and multimedia , 2003, ETP '03.
[66] Jean-Marc Odobez,et al. Audio-visual speaker tracking with importance particle filters , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).
[67] Gérard Chollet,et al. Audio-Visual Speech Synchrony Measure for Talking-Face Identity Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[68] John R. Smith,et al. Data Modeling Strategies for Imbalanced Learning in Visual Search , 2007, 2007 IEEE International Conference on Multimedia and Expo.
[69] A. Blake,et al. Sequential Monte Carlo fusion of sound and vision for speaker tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
[70] Benoît Maison,et al. Joint processing of audio and visual information for multimedia indexing and human-computer interaction , 2000, RIAO.
[71] Min Xu,et al. Efficient sampling of training set in large and noisy multimedia data , 2007, TOMCCAP.
[72] Ren C. Luo,et al. Multisensor fusion and integration: approaches, applications, and future research directions , 2002 .
[73] Zhu Liu,et al. Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..
[74] Shih-Fu Chang,et al. Layered dynamic mixture model for pattern discovery in asynchronous multi-modal streams [video applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[75] Ellen M. Voorhees,et al. Learning collection fusion strategies , 1995, SIGIR '95.
[76] S. Son,et al. GROUP-BASED EVENT DETECTION IN UNDERSEA SENSOR NETWORKS , 2005 .
[77] Changsheng Xu,et al. A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video , 2008, IEEE Transactions on Multimedia.
[78] Chris Stauffer,et al. Automated Audio-visual Activity Analysis , 2005 .
[79] James Llinas,et al. An introduction to multisensor data fusion , 1997, Proc. IEEE.
[80] R. Manmatha,et al. Using Maximum Entropy for Automatic Image Annotation , 2004, CIVR.
[81] Ajay Divakaran. Multimedia Content Analysis: Theory and Applications , 2008 .
[82] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[83] John W. McDonough,et al. A joint particle filter for audio-visual speaker tracking , 2005, ICMI '05.
[84] Seppo Puuronen,et al. MULTILEVEL CONTEXT REPRESENTATION USING SEMANTIC METANETWORK , 1997 .
[85] Ishwar K. Sethi,et al. Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.
[86] Mohan S. Kankanhalli,et al. Experiential Sampling on Multiple Data Streams , 2006, IEEE Transactions on Multimedia.
[87] Pradeep K. Atrey,et al. Smart mirror for ambient home environment , 2007 .
[88] Christopher Town,et al. Multi-sensory and Multi-modal Fusion for Sentient Computing , 2007, International Journal of Computer Vision.
[89] S. Iyengar,et al. Multi-Sensor Fusion: Fundamentals and Applications With Software , 1997 .
[90] Hai Leong Chieu,et al. Query based event extraction along a timeline , 2004, SIGIR '04.
[91] Harriet J. Nock,et al. Discriminative model fusion for semantic concept detection and annotation in video , 2003, ACM Multimedia.
[92] Denis Pellerin,et al. Video classification based on low-level feature fusion model , 2005, 2005 13th European Signal Processing Conference.
[93] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.
[94] Ramesh Jain,et al. Experiential Sampling for video surveillance , 2003, IWVS '03.
[95] Sascha Spors,et al. Joint audio-video object localization and tracking , 2001 .
[96] Mel Siegel,et al. Sensor data fusion for context-aware computing using dempster-shafer theory , 2004 .
[97] John R. Smith,et al. Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..
[98] Harriet J. Nock,et al. Audio-visual synchrony for detection of monologues in video archives , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).
[99] Ying Liu,et al. Integrating Semantic Templates with Decision Tree for Image Semantic Learning , 2007, MMM.
[100] Tom E. Bishop,et al. Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[101] Mei-Chen Yeh,et al. Multimodal fusion using learned text concepts for image categorization , 2006, MM '06.
[102] Jeffrey K. Uhlmann,et al. New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.
[103] Ishwar K. Sethi,et al. Audio-visual talking face detection , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).
[104] Rainer Stiefelhagen,et al. Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures , 2004, ICMI '04.
[105] Ruzena Bajcsy,et al. The Sensor Selection Problem for Bounded Uncertainty Sensing Models , 2005, IEEE Transactions on Automation Science and Engineering.
[106] Shih-Fu Chang,et al. Combining text and audio-visual features in video indexing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[107] Uwe Aickelin,et al. Anomaly Detection Using the Dempster-Shafer Method , 2006, DMIN.
[108] Luis Mateus Rocha,et al. Singular value decomposition and principal component analysis , 2003 .
[109] Kevin P. Murphy,et al. Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..
[110] J. Jacko,et al. The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .
[111] Xian-Sheng Hua,et al. An Attention-Based Decision Fusion Scheme for Multimedia Information Retrieval , 2004, PCM.
[112] Eindhoven,et al. Ep 1 881 486 B1 European Patent Specification Gb-a-2 353 926 @bullet Faller C Et Al: "efficient Representation of Spatial Audio Using Perceptual Parametrization" Ieee Workshop on Applications of Signal Processing to Audio and Acoustics , .
[113] Vlasta Radová,et al. An approach to speaker identification using multiple classifiers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[114] Samy Bengio,et al. Database, protocols and tools for evaluating score-level fusion algorithms in biometric authentication , 2006, Pattern Recognit..
[115] Noboru Babaguchi,et al. Event based indexing of broadcasted sports video by intermodal collaboration , 2002, IEEE Trans. Multim..
[116] H.K. Ekenel,et al. Kalman filters for audio-video source localization , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..
[117] Azriel Rosenfeld,et al. Face recognition: A literature survey , 2003, CSUR.
[118] Shengbing Jiang,et al. Optimal sensor selection for discrete-event systems with partial observation , 2003, IEEE Trans. Autom. Control..
[119] Aggelos K. Katsaggelos,et al. Optimal sensor selection for video-based target tracking in a wireless sensor network , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..
[120] Vladimir Pavlovic,et al. Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection , 2002, Object recognition supported by user interaction for service robots.
[121] Norbert Pfleger,et al. Context based multimodal fusion , 2004, ICMI '04.
[122] Lianhong Cai,et al. Multi-level Fusion of Audio and Visual Features for Speaker Identification , 2006, ICB.
[123] Marcel Worring,et al. A review on multimodal video indexing , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.
[124] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[125] Yi Ding,et al. Segmental Hidden Markov Models for View-based Sport Video Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[126] Shih-Fu Chang,et al. Story boundary detection in large broadcast news video archives: techniques, experience and trends , 2004, MULTIMEDIA '04.
[127] Mel Siegel,et al. Confidence fusion [sensor fusion] , 2004, International Workshop on Robot Sensing, 2004. ROSE 2004..
[128] Rong Yan,et al. Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.
[129] Trevor Darrell,et al. Ausio-visual Segmentation and "The Cocktail Party Effect" , 2000, ICMI.
[130] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[131] Mau-Tsuen Yang,et al. A multimodal fusion system for people detection and tracking , 2005, Int. J. Imaging Syst. Technol..
[132] Alan F. Smeaton,et al. A Comparison of Score, Rank and Probability-Based Fusion Methods for Video Shot Retrieval , 2005, CIVR.
[133] Michael Wagner,et al. Audio-visual multimodal fusion for biometric person authentication and liveness verification , 2006 .
[134] Mohan S. Kankanhalli,et al. Goal-oriented optimal subset selection of correlated multimedia streams , 2007, TOMCCAP.
[135] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[136] Bakkama Srinath Reddy,et al. Evidential Reasoning for Multimodal Fusion in Human Computer Interaction , 2007 .
[137] Zoubin Ghahramani,et al. A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.
[138] Thijs Westerveld,et al. Image Retrieval: Content versus Context , 2000, RIAO.
[139] Christophe Andrieu,et al. Particle methods for change detection, system identification, and control , 2004, Proceedings of the IEEE.
[140] Sharon L. Oviatt,et al. Ten myths of multimodal interaction , 1999, Commun. ACM.
[141] Rich Caruana,et al. Getting the Most Out of Ensemble Selection , 2006, Sixth International Conference on Data Mining (ICDM'06).
[142] Nicu Sebe,et al. Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.
[143] Samy Bengio,et al. Confidence measures for multimodal identity verification , 2002, Inf. Fusion.
[144] Gérard Chollet,et al. Audiovisual Speech Synchrony Measure: Application to Biometrics , 2007, EURASIP J. Adv. Signal Process..
[145] Edward Y. Chang,et al. Multimodal metadata fusion using causal strength , 2005, ACM Multimedia.
[146] Nebojsa Jojic,et al. Audio-visual graphical models for speech processing , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[147] Tat-Seng Chua,et al. Fusion of AV features and external information sources for event detection in team sports video , 2006, TOMCCAP.
[148] Sharon Oviatt,et al. Multimodal Interfaces , 2008, Encyclopedia of Multimedia.
[149] Gérard Chollet,et al. BIOMET: A Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities , 2003, AVBPA.
[150] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[151] J. B. Mena. COLOR IMAGE SEGMENTATION USING THE DEMPSTER-SHAFER THEORY OF EVIDENCE FOR THE FUSION OF TEXTURE , 2003 .
[152] Huosheng Hu,et al. CSM-422 Sensors and Data Fusion Algorithms in Mobile Robotics , 2005 .
[153] Shuzhi Sam Ge,et al. Motion estimation using audio and video fusion , 2004, ICARCV 2004 8th Control, Automation, Robotics and Vision Conference, 2004..
[154] Hong Yan,et al. Comparison of face verification results on the XM2VTFS database , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.
[155] Harriet J. Nock,et al. Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study , 2003, CIVR.
[156] Juan J. Igarza,et al. MCYT baseline corpus: a bimodal biometric database , 2003 .
[157] Mohan S. Kankanhalli,et al. Experiential Sampling in Multimedia Systems , 2006, IEEE Transactions on Multimedia.
[158] Ben J. A. Kröse,et al. EM detection of common origin of multi-modal cues , 2006, ICMI '06.