论文信息 - Diarization, Localization and Indexing of Meeting Archives

Diarization, Localization and Indexing of Meeting Archives

vii CHAPTER

[1] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] P. Jonathon Phillips,et al. Face recognition vendor test 2002 , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[3] Martial Michel,et al. The NIST Meeting Room Pilot Corpus , 2004, LREC.

[4] Farzin Deravi,et al. Design issues for a digital audio-visual integrated database , 1996 .

[5] Jonathan G. Fiscus,et al. The Rich Transcription 2005 Spring Meeting Recognition Evaluation , 2005, MLMI.

[6] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.

[7] E. Mayoraz,et al. Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[8] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[9] Arun Ross,et al. An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[10] Ramani Duraiswami,et al. Accelerated speech source localization via a hierarchical search of steered response power , 2004, IEEE Transactions on Speech and Audio Processing.

[11] Gopal Sarma Pingali,et al. Audio-visual tracking for natural interactivity , 1999, MULTIMEDIA '99.

[12] Bojan Cukic,et al. A Classification Approach to Multi-biometric Score Fusion , 2005, AVBPA.

[13] Paris Smaragdis,et al. AUDIO/VISUAL INDEPENDENT COMPONENTS , 2003 .

[14] Arun Ross,et al. Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[15] Patrick Pérez,et al. Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking , 2001, ICCV.

[16] Yong Rui,et al. Real-time speaker tracking using particle filter sensor fusion , 2004, Proceedings of the IEEE.

[17] Jean-Philippe Thiran,et al. The BANCA Database and Evaluation Protocol , 2003, AVBPA.

[18] Arun Ross,et al. Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[19] Jean-François Bonastre,et al. Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..

[20] Roberto Brunelli,et al. Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Alvin F. Martin,et al. NIST's Assessment of Text Independent Speaker Recognition Performance , 2002 .

[22] Alex Pentland,et al. Looking at People: Sensing for Ubiquitous and Wearable Computing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[23] Sylvain Meignier,et al. SPEAKER DIARIZATION IN THE ELISA CONSORTIUM OVER THE LAST 4 YEARS , 2004 .

[24] Trevor Darrell,et al. Multiple person and speaker activity tracking with a particle filter , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25] Sudeep Sarkar,et al. Audio Segmentation and Speaker Localization in Meeting Videos , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[26] Naoyuki Ichimura,et al. An Application of a Particle Filter to Bayesian Multiple Sound Source Tracking with Audio and Video Information Fusion , 2004 .

[27] Darren B. Ward,et al. Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[28] Sudeep Sarkar,et al. Supervised Learning of Large Perceptual Organization: Graph Spectral Partitioning and Learning Automata , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29] Anil K. Jain,et al. Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[30] Terence Sim,et al. The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[31] SnelickRobert,et al. Large-Scale Evaluation of Multimodal Biometric Authentication Using State-of-the-Art Systems , 2005 .

[32] Bernadette Dorizzi,et al. Multimodal biometric score fusion: The Mean Rule vs. support vector classifiers , 2005, 2005 13th European Signal Processing Conference.

[33] Rolf Ingold,et al. MYIDEA - MULTIMODAL BIOMETRICS DATABASE, DESCRIPTION OF ACQUISITION PROTOCOLS , 2005 .

[34] Jan Giebel,et al. Shape-based pedestrian detection and tracking , 2002, Intelligent Vehicle Symposium, 2002. IEEE.

[35] Francis Quek,et al. Gesture cues for conversational interaction in monocular video , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[36] M. A. Siegler,et al. Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[37] Anil K. Jain,et al. Likelihood Ratio-Based Biometric Score Fusion , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Trevor Darrell,et al. Probabalistic Models and Informative Subspaces for Audiovisual Correspondence , 2002, ECCV.

[39] Kentaro Toyama,et al. Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[40] Nikos Fakotakis,et al. Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[41] Rainer Stiefelhagen,et al. The CLEAR 2006 Evaluation , 2006, CLEAR.

[42] Tieniu Tan,et al. Recent developments in human motion analysis , 2003, Pattern Recognit..

[43] Frédéric Bimbot,et al. Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs , 2004, INTERSPEECH.

[44] Carlos Busso,et al. Smart room: participant and speaker localization and identification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[45] Mark J. F. Gales,et al. The Cambridge University March 2005 speaker diarisation system , 2005, INTERSPEECH.

[46] Til Aach,et al. Detection and recognition of moving objects using statistical motion detection and Fourier descriptors , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[47] Guillaume Lathoud,et al. A sector-based, frequency-domain approach to detection and localization of multiple speakers , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[48] Tanja Schultz,et al. Speaker segmentation and clustering in meetings , 2004, INTERSPEECH.

[49] Jean-Luc Gauvain,et al. Towards Using STT for Broadcast News Speaker Diarization , 2004 .

[50] Sudeep Sarkar,et al. An outdoor biometric system: evaluation of normalization fusion schemes for face and voice , 2006, SPIE Defense + Commercial Sensing.

[51] Patrick Kenny,et al. Combining Gaussianized/Non-Gaussianized Features to Improve Speaker Diarization of Telephone Conversations , 2007, IEEE Signal Processing Letters.

[52] Rashid Ansari,et al. Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures , 2002, 2002 11th European Signal Processing Conference.

[53] Patrick J. Flynn,et al. Using multiple gallery and probe images per person to improve performance of face recognition , 2003 .

[54] U. Uludag,et al. Multimodal Biometric Authentication Methods : A COTS Approach , 2003 .

[55] Paul A. Viola,et al. Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[56] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.

[57] Yasushi Yagi,et al. Human detection in outdoor scene using spatio-temporal motion analysis , 2004, ICPR 2004.

[58] David Zhang,et al. Personal recognition using hand shape and texture , 2006, IEEE Transactions on Image Processing.

[59] Hyeonjoon Moon,et al. The FERET Evaluation Methodology for Face-Recognition Algorithms , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[60] Sabri Gurbuz,et al. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus , 2002, EURASIP J. Adv. Signal Process..

[61] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[62] Javier Ortega-Garcia,et al. Multimodal biometric databases: an overview , 2006 .

[63] Larry S. Davis,et al. Multimodal 3-D tracking and event detection via the particle filter , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[64] Hsin-Min Wang,et al. A sequential metric-based audio segmentation method via the Bayesian information criterion , 2003, INTERSPEECH.

[65] Trevor Darrell,et al. A multi-modal approach for determining speaker location and focus , 2003, ICMI '03.

[66] Kuldip K. Paliwal,et al. Information Fusion and Person Verification Using Speech & Face Information , 2002 .

[67] Alan Mink,et al. Multimodal Biometric Authentication Methods: A COTS Approach | NIST , 2003 .

[68] M. Viberg,et al. Two decades of array signal processing research: the parametric approach , 1996, IEEE Signal Process. Mag..

[69] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[70] Ramesh A. Gopinath,et al. Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.

[71] Wei-Yun Yau,et al. A Bayesian Framework for Robust Human Detection and Occlusion Handling using Human Shape Model , 2004, International Conference on Pattern Recognition.

[72] A. Murat Tekalp,et al. Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[73] Jianpeng Zhou,et al. Real Time Robust Human Detection and Tracking System , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[74] David Chandler,et al. Biometric Product Testing Final Report , 2001 .

[75] Barbara Peskin,et al. TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM , 2004 .

[76] D A Reynolds,et al. The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations , 2004 .

[77] Xavier Anguera Miró,et al. Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System , 2006, MLMI.

[78] Aggelos K. Katsaggelos,et al. Audio-Visual Biometrics , 2006, Proceedings of the IEEE.

[79] Anoop Gupta,et al. Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.

[80] Julian Fiérrez,et al. Adapted user-dependent multimodal biometric authentication exploiting general information , 2005, Pattern Recognit. Lett..

[81] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .

[82] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[83] Sudeep Sarkar,et al. Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[84] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[85] Vladimir Pavlovic,et al. Multimodal speaker detection using error feedback dynamic Bayesian networks , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[86] Xudong Jiang,et al. Exploiting global and local decisions for multimodal biometrics verification , 2004, IEEE Transactions on Signal Processing.

[87] S. Ribaric,et al. Experimental Evaluation of Matching-Score Normalization Techniques on Different Multimodal Biometric Systems , 2006, MELECON 2006 - 2006 IEEE Mediterranean Electrotechnical Conference.

[88] Jiri Matas,et al. On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[89] Mohammed Yeasin,et al. Prosody based co-analysis for continuous recognition of coverbal gestures , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[90] Anil K. Jain,et al. Quality-based Score Level Fusion in Multibiometric Systems , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[91] Gérard Chollet,et al. BIOMET: A Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities , 2003, AVBPA.

[92] Patrick J. Flynn,et al. An evaluation of multimodal 2D+3D face biometrics , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93] Jiri Matas,et al. XM2VTSDB: The Extended M2VTS Database , 1999 .

[94] Fatih Murat Porikli,et al. Achieving real-time object detection and tracking under extreme conditions , 2006, Journal of Real-Time Image Processing.

[95] Arun Ross,et al. Learning user-specific parameters in a multibiometric system , 2002, Proceedings. International Conference on Image Processing.

[96] Yasushi Yagi,et al. Human detection in outdoor scene using spatio-temporal motion analysis , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[97] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[98] Wei-Yun Yau,et al. Combination of hyperbolic functions for multimodal biometrics data fusion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[99] Michael Shapiro Brandstein,et al. A framework for speech source localization using sensor arrays , 1995 .

[100] Andrew Blake,et al. Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[101] H. Sidenbladh,et al. Detecting human motion with support vector machines , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[102] B.Y. Smolenski,et al. Generic Modeling Applied to Speaker Count , 2006, 2006 International Symposium on Intelligent Signal Processing and Communications.

[103] Pierre Vandergheynst,et al. Analysis of multimodal sequences using geometric video representations , 2006, Signal Process..

[104] Steve Young,et al. Segment generation and clustering in the HTK broadcast news transcription system , 1998 .

[105] Hyunwoo Kim,et al. Real-time multiple people detection using skin color, motion and appearance information , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[106] Dariu Gavrila,et al. The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[107] Douglas A. Reynolds,et al. An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[108] Trevor Darrell,et al. Ausio-visual Segmentation and "The Cocktail Party Effect" , 2000, ICMI.

[109] Vladimir Vezhnevets,et al. A Survey on Pixel-Based Skin Color Detection Techniques , 2003 .

[110] Nebojsa Jojic,et al. A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[111] Alex Pentland,et al. Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[112] Michael R. M. Jenkin,et al. Audiovisual localization of multiple speakers in a video teleconferencing setting , 2003, Int. J. Imaging Syst. Technol..

[113] Jitendra Ajmera,et al. A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[114] Jake K. Aggarwal,et al. Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[115] Harriet J. Nock,et al. Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study , 2003, CIVR.

[116] Sudeep Sarkar,et al. Clip retrieval using multi-modal biometrics in meeting archives , 2008, 2008 19th International Conference on Pattern Recognition.

[117] Jean-François Bonastre,et al. E-HMM approach for learning and adapting sound models for speaker indexing , 2001, Odyssey.

[118] Anil K. Jain,et al. Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.