Moments in Time Dataset: One Million Videos for Event Understanding
暂无分享,去创建一个
Bolei Zhou | Lisa M. Brown | Quanfu Fan | Aude Oliva | Dan Gutfreund | Carl Vondrick | Sarah Adel Bargal | Mathew Monfort | Kandan Ramakrishnan | Alex Andonian | Tom Yan | A. Oliva | Mathew Monfort | Carl Vondrick | Bolei Zhou | Quanfu Fan | L. Brown | K. Ramakrishnan | Dan Gutfreund | A. Andonian | Tom Yan
[1] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[2] Martha Palmer,et al. PropBank: the Next Level of TreeBank , 2003 .
[3] Yale Song,et al. Video2GIF: Automatic Generation of Animated GIFs from Video , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[5] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[6] DarrellTrevor,et al. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description , 2017 .
[7] Ronen Basri,et al. Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[8] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[11] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[12] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.
[13] Yang Wang,et al. Pulling Actions out of Context: Explicit Separation for Effective Combination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[15] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.
[16] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[17] Deva Ramanan,et al. Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[18] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Karol J. Piczak. Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).
[20] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Jitendra Malik,et al. From Lifestyle Vlogs to Everyday Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[22] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[23] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[24] John B. Lowe,et al. The Berkeley FrameNet Project , 1998, ACL.
[25] Charless C. Fowlkes,et al. The Open World of Micro-Videos , 2016, ArXiv.
[26] Mitchell P. Marcus,et al. OntoNotes: The 90% Solution , 2006, NAACL.
[27] Horst Bischof,et al. A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.
[28] Basura Fernando,et al. Learning End-to-end Video Classification with Rank-Pooling , 2016, ICML.
[29] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[30] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[31] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] P. Barrouillet,et al. Time constraints and resource sharing in adults' working memory spans. , 2004, Journal of experimental psychology. General.
[34] Andrea Vedaldi,et al. Dynamic Image Networks for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[37] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.
[38] P. Laplace. Memoir on the Probability of the Causes of Events , 1986 .
[39] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[40] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[41] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[42] Martha Palmer,et al. Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .
[43] Birger Kollmeier,et al. On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’ , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[44] Cordelia Schmid,et al. Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[45] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[47] Deva Ramanan,et al. Parsing Videos of Actions with Segmental Grammars , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[48] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[49] Shridhar Jayanthi. Corpus of Contemporary American English (COCA) , 2019 .
[50] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[51] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[52] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[53] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[55] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[56] Mark Davies. The Corpus of Contemporary American English (COCA) , 2012 .
[57] M. D’Esposito. Working memory. , 2008, Handbook of clinical neurology.
[58] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[59] Antonio Torralba,et al. LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[60] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.