Diarization, Localization and Indexing of Meeting Archives

vii CHAPTER

[1]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  P. Jonathon Phillips,et al.  Face recognition vendor test 2002 , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[3]  Martial Michel,et al.  The NIST Meeting Room Pilot Corpus , 2004, LREC.

[4]  Farzin Deravi,et al.  Design issues for a digital audio-visual integrated database , 1996 .

[5]  Jonathan G. Fiscus,et al.  The Rich Transcription 2005 Spring Meeting Recognition Evaluation , 2005, MLMI.

[6]  Malcolm Slaney,et al.  FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.

[7]  E. Mayoraz,et al.  Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[8]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[9]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Ramani Duraiswami,et al.  Accelerated speech source localization via a hierarchical search of steered response power , 2004, IEEE Transactions on Speech and Audio Processing.

[11]  Gopal Sarma Pingali,et al.  Audio-visual tracking for natural interactivity , 1999, MULTIMEDIA '99.

[12]  Bojan Cukic,et al.  A Classification Approach to Multi-biometric Score Fusion , 2005, AVBPA.

[13]  Paris Smaragdis,et al.  AUDIO/VISUAL INDEPENDENT COMPONENTS , 2003 .

[14]  Arun Ross,et al.  Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[15]  Patrick Pérez,et al.  Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking , 2001, ICCV.

[16]  Yong Rui,et al.  Real-time speaker tracking using particle filter sensor fusion , 2004, Proceedings of the IEEE.

[17]  Jean-Philippe Thiran,et al.  The BANCA Database and Evaluation Protocol , 2003, AVBPA.

[18]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[19]  Jean-François Bonastre,et al.  Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..

[20]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Alvin F. Martin,et al.  NIST's Assessment of Text Independent Speaker Recognition Performance , 2002 .

[22]  Alex Pentland,et al.  Looking at People: Sensing for Ubiquitous and Wearable Computing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Sylvain Meignier,et al.  SPEAKER DIARIZATION IN THE ELISA CONSORTIUM OVER THE LAST 4 YEARS , 2004 .

[24]  Trevor Darrell,et al.  Multiple person and speaker activity tracking with a particle filter , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Sudeep Sarkar,et al.  Audio Segmentation and Speaker Localization in Meeting Videos , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[26]  Naoyuki Ichimura,et al.  An Application of a Particle Filter to Bayesian Multiple Sound Source Tracking with Audio and Video Information Fusion , 2004 .

[27]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[28]  Sudeep Sarkar,et al.  Supervised Learning of Large Perceptual Organization: Graph Spectral Partitioning and Learning Automata , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  SnelickRobert,et al.  Large-Scale Evaluation of Multimodal Biometric Authentication Using State-of-the-Art Systems , 2005 .

[32]  Bernadette Dorizzi,et al.  Multimodal biometric score fusion: The Mean Rule vs. support vector classifiers , 2005, 2005 13th European Signal Processing Conference.

[33]  Rolf Ingold,et al.  MYIDEA - MULTIMODAL BIOMETRICS DATABASE, DESCRIPTION OF ACQUISITION PROTOCOLS , 2005 .

[34]  Jan Giebel,et al.  Shape-based pedestrian detection and tracking , 2002, Intelligent Vehicle Symposium, 2002. IEEE.

[35]  Francis Quek,et al.  Gesture cues for conversational interaction in monocular video , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[36]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[37]  Anil K. Jain,et al.  Likelihood Ratio-Based Biometric Score Fusion , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Trevor Darrell,et al.  Probabalistic Models and Informative Subspaces for Audiovisual Correspondence , 2002, ECCV.

[39]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[40]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[41]  Rainer Stiefelhagen,et al.  The CLEAR 2006 Evaluation , 2006, CLEAR.

[42]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[43]  Frédéric Bimbot,et al.  Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs , 2004, INTERSPEECH.

[44]  Carlos Busso,et al.  Smart room: participant and speaker localization and identification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[45]  Mark J. F. Gales,et al.  The Cambridge University March 2005 speaker diarisation system , 2005, INTERSPEECH.

[46]  Til Aach,et al.  Detection and recognition of moving objects using statistical motion detection and Fourier descriptors , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[47]  Guillaume Lathoud,et al.  A sector-based, frequency-domain approach to detection and localization of multiple speakers , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[48]  Tanja Schultz,et al.  Speaker segmentation and clustering in meetings , 2004, INTERSPEECH.

[49]  Jean-Luc Gauvain,et al.  Towards Using STT for Broadcast News Speaker Diarization , 2004 .

[50]  Sudeep Sarkar,et al.  An outdoor biometric system: evaluation of normalization fusion schemes for face and voice , 2006, SPIE Defense + Commercial Sensing.

[51]  Patrick Kenny,et al.  Combining Gaussianized/Non-Gaussianized Features to Improve Speaker Diarization of Telephone Conversations , 2007, IEEE Signal Processing Letters.

[52]  Rashid Ansari,et al.  Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures , 2002, 2002 11th European Signal Processing Conference.

[53]  Patrick J. Flynn,et al.  Using multiple gallery and probe images per person to improve performance of face recognition , 2003 .

[54]  U. Uludag,et al.  Multimodal Biometric Authentication Methods : A COTS Approach , 2003 .

[55]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[56]  Javier R. Movellan,et al.  Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.

[57]  Yasushi Yagi,et al.  Human detection in outdoor scene using spatio-temporal motion analysis , 2004, ICPR 2004.

[58]  David Zhang,et al.  Personal recognition using hand shape and texture , 2006, IEEE Transactions on Image Processing.

[59]  Hyeonjoon Moon,et al.  The FERET Evaluation Methodology for Face-Recognition Algorithms , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Sabri Gurbuz,et al.  Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus , 2002, EURASIP J. Adv. Signal Process..

[61]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[62]  Javier Ortega-Garcia,et al.  Multimodal biometric databases: an overview , 2006 .

[63]  Larry S. Davis,et al.  Multimodal 3-D tracking and event detection via the particle filter , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[64]  Hsin-Min Wang,et al.  A sequential metric-based audio segmentation method via the Bayesian information criterion , 2003, INTERSPEECH.

[65]  Trevor Darrell,et al.  A multi-modal approach for determining speaker location and focus , 2003, ICMI '03.

[66]  Kuldip K. Paliwal,et al.  Information Fusion and Person Verification Using Speech & Face Information , 2002 .

[67]  Alan Mink,et al.  Multimodal Biometric Authentication Methods: A COTS Approach | NIST , 2003 .

[68]  M. Viberg,et al.  Two decades of array signal processing research: the parametric approach , 1996, IEEE Signal Process. Mag..

[69]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[70]  Ramesh A. Gopinath,et al.  Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.

[71]  Wei-Yun Yau,et al.  A Bayesian Framework for Robust Human Detection and Occlusion Handling using Human Shape Model , 2004, International Conference on Pattern Recognition.

[72]  A. Murat Tekalp,et al.  Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[73]  Jianpeng Zhou,et al.  Real Time Robust Human Detection and Tracking System , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[74]  David Chandler,et al.  Biometric Product Testing Final Report , 2001 .

[75]  Barbara Peskin,et al.  TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM , 2004 .

[76]  D A Reynolds,et al.  The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations , 2004 .

[77]  Xavier Anguera Miró,et al.  Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System , 2006, MLMI.

[78]  Aggelos K. Katsaggelos,et al.  Audio-Visual Biometrics , 2006, Proceedings of the IEEE.

[79]  Anoop Gupta,et al.  Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.

[80]  Julian Fiérrez,et al.  Adapted user-dependent multimodal biometric authentication exploiting general information , 2005, Pattern Recognit. Lett..

[81]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[82]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[83]  Sudeep Sarkar,et al.  Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[84]  Michael Elad,et al.  Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[85]  Vladimir Pavlovic,et al.  Multimodal speaker detection using error feedback dynamic Bayesian networks , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[86]  Xudong Jiang,et al.  Exploiting global and local decisions for multimodal biometrics verification , 2004, IEEE Transactions on Signal Processing.

[87]  S. Ribaric,et al.  Experimental Evaluation of Matching-Score Normalization Techniques on Different Multimodal Biometric Systems , 2006, MELECON 2006 - 2006 IEEE Mediterranean Electrotechnical Conference.

[88]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[89]  Mohammed Yeasin,et al.  Prosody based co-analysis for continuous recognition of coverbal gestures , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[90]  Anil K. Jain,et al.  Quality-based Score Level Fusion in Multibiometric Systems , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[91]  Gérard Chollet,et al.  BIOMET: A Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities , 2003, AVBPA.

[92]  Patrick J. Flynn,et al.  An evaluation of multimodal 2D+3D face biometrics , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[94]  Fatih Murat Porikli,et al.  Achieving real-time object detection and tracking under extreme conditions , 2006, Journal of Real-Time Image Processing.

[95]  Arun Ross,et al.  Learning user-specific parameters in a multibiometric system , 2002, Proceedings. International Conference on Image Processing.

[96]  Yasushi Yagi,et al.  Human detection in outdoor scene using spatio-temporal motion analysis , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[97]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[98]  Wei-Yun Yau,et al.  Combination of hyperbolic functions for multimodal biometrics data fusion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[99]  Michael Shapiro Brandstein,et al.  A framework for speech source localization using sensor arrays , 1995 .

[100]  Andrew Blake,et al.  Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[101]  H. Sidenbladh,et al.  Detecting human motion with support vector machines , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[102]  B.Y. Smolenski,et al.  Generic Modeling Applied to Speaker Count , 2006, 2006 International Symposium on Intelligent Signal Processing and Communications.

[103]  Pierre Vandergheynst,et al.  Analysis of multimodal sequences using geometric video representations , 2006, Signal Process..

[104]  Steve Young,et al.  Segment generation and clustering in the HTK broadcast news transcription system , 1998 .

[105]  Hyunwoo Kim,et al.  Real-time multiple people detection using skin color, motion and appearance information , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[106]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[107]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[108]  Trevor Darrell,et al.  Ausio-visual Segmentation and "The Cocktail Party Effect" , 2000, ICMI.

[109]  Vladimir Vezhnevets,et al.  A Survey on Pixel-Based Skin Color Detection Techniques , 2003 .

[110]  Nebojsa Jojic,et al.  A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[111]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[112]  Michael R. M. Jenkin,et al.  Audiovisual localization of multiple speakers in a video teleconferencing setting , 2003, Int. J. Imaging Syst. Technol..

[113]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[114]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[115]  Harriet J. Nock,et al.  Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study , 2003, CIVR.

[116]  Sudeep Sarkar,et al.  Clip retrieval using multi-modal biometrics in meeting archives , 2008, 2008 19th International Conference on Pattern Recognition.

[117]  Jean-François Bonastre,et al.  E-HMM approach for learning and adapting sound models for speaker indexing , 2001, Odyssey.

[118]  Anil K. Jain,et al.  Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.