RAVEL: an annotated corpus for training robots with audiovisual abilities
暂无分享,去创建一个
Radu Horaud | Xavier Alameda-Pineda | Kaustubh Kulkarni | Vojtech Franc | Jordi Sanchez-Riera | Jan Cech | Johannes Wienke | Antoine Deleforge | Vojtech Franc | R. Horaud | Xavier Alameda-Pineda | Kaustubh Kulkarni | Jan Cech | Jordi Sanchez-Riera | Antoine Deleforge | Johannes Wienke
[1] Christopher Joseph Pal,et al. Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[2] Andrew Zisserman,et al. Multiple View Geometry in Computer Vision (2nd ed) , 2003 .
[3] Shaogang Gong,et al. Audio- and Video-based Biometric Person Authentication , 1997, Lecture Notes in Computer Science.
[4] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[5] J.N. Gowdy,et al. CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[6] Dima Damen,et al. Proceedings of the British Machine Vision Conference , 2014, BMVC 2014.
[7] Luc Van Gool,et al. Exemplar-based Action Recognition in Video , 2009, BMVC.
[8] David Elliott,et al. In the Wild , 2010 .
[9] Radu Horaud,et al. Scene flow estimation by growing correspondence seeds , 2011, CVPR 2011.
[10] E. C. Cmm,et al. on the Recognition of Speech, with , 2008 .
[11] Takeo Kanade,et al. Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[13] Jon Barker,et al. An automatic speech recognition system based on the scene analysis account of auditory perception , 2007, Speech Commun..
[14] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[15] Jiri Matas,et al. XM2VTSDB: The Extended M2VTS Database , 1999 .
[16] Chi-Ho Chan,et al. On the Results of the First Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation , 2010, ICPR Contests.
[17] Josh H. McDermott. The cocktail party problem , 2009, Current Biology.
[18] Radu Horaud,et al. Cyclopean Geometry of Binocular Vision , 2008, Journal of the Optical Society of America. A, Optics, image science, and vision.
[19] James R. Glass,et al. A segment-based audio-visual speech recognizer: data collection, development, and initial experiments , 2004, ICMI '04.
[20] Jon Barker,et al. The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements , 2008, ICMI '08.
[21] Yasser F. O. Mohammad,et al. The H3R Explanation Corpus human-human and base human-robot interaction dataset , 2008, 2008 International Conference on Intelligent Sensors, Sensor Networks and Information Processing.
[22] Jean-Marc Odobez,et al. AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking , 2004, MLMI.
[23] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[24] H. Opower. Multiple view geometry in computer vision , 2002 .
[25] Cordelia Schmid,et al. Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[26] Uwe D. Hanebeck,et al. The KIT Robo-kitchen data set for the evaluation of view-based activity recognition systems , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.
[27] Jean-Yves Bouguet,et al. Camera calibration toolbox for matlab , 2001 .
[28] Ronen Basri,et al. Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[29] Hennie Brugman,et al. Annotating Multi-media/Multi-modal Resources with ELAN , 2004, LREC.
[30] Zdenek Kalal,et al. Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] Jean-Philippe Thiran,et al. The BANCA Database and Evaluation Protocol , 2003, AVBPA.
[32] Moritz Tenorth,et al. The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.
[33] Ren C. Luo,et al. Multisensor integration and fusion in intelligent systems , 1989, IEEE Trans. Syst. Man Cybern..
[34] Radu Horaud,et al. Conjugate Mixture Models for Clustering Multimodal Data , 2011, Neural Computation.
[35] Li Wang,et al. Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models , 2011, International Journal of Computer Vision.
[36] Munsang Kim,et al. Human-Robot Interaction in Real Environments by Audio-Visual Integration , 2007 .
[37] Fernando De la Torre,et al. Joint segmentation and classification of human actions in video , 2011, CVPR 2011.
[38] Rémi Ronfard,et al. Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..
[39] Jiebo Luo,et al. Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[40] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[41] Ben J. A. Kröse,et al. From Sensors to Human Spatial Concepts: An Annotated Data Set , 2008, IEEE Transactions on Robotics.
[42] Radu Horaud,et al. Finding audio-visual events in informal social gatherings , 2011, ICMI '11.