Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions
暂无分享,去创建一个
[1] J. Tangney,et al. Too Good to Be True: Bots and Bad Data From Mechanical Turk. , 2022, Perspectives on psychological science : a journal of the Association for Psychological Science.
[2] U. Noppeney. Perceptual Inference, Learning, and Attention in a Multisensory World. , 2021, Annual review of neuroscience.
[3] Andrew Zisserman,et al. A Short Note on the Kinetics-700-2020 Human Action Dataset , 2020, ArXiv.
[4] Andrew Zisserman,et al. The AVA-Kinetics Localized Human Actions Video Dataset , 2020, ArXiv.
[5] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[6] Ale Smidts,et al. Neural similarity at temporal lobe and cerebellum predicts out-of-sample preference and recall for video stimuli , 2019, NeuroImage.
[7] Radoslaw Martin Cichy,et al. Deep Neural Networks as Scientific Models , 2019, Trends in Cognitive Sciences.
[8] Sean A. Dennis,et al. Online Worker Fraud and Evolving Threats to the Integrity of MTurk Data: A Discussion of Virtual Private Servers and the Limitations of IP-Based Screening Procedures , 2019, Behavioral Research in Accounting.
[9] J. Gray,et al. PsychoPy2: Experiments in behavior made easy , 2019, Behavior research methods.
[10] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[12] Matthias Bethge,et al. Generalisation in humans and deep neural networks , 2018, NeurIPS.
[13] Daniel L. K. Yamins,et al. A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.
[14] Bolei Zhou,et al. Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Maja Pantic,et al. End-to-End Audiovisual Fusion with LSTMs , 2017, AVSP.
[17] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[18] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[23] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[24] Kevin Crowston,et al. Amazon Mechanical Turk: A Research Tool for Organizations and Information Systems Scholars , 2012, Shaping the Future of ICT Research.
[25] Uta Noppeney,et al. Physical and Perceptual Factors Shape the Neural Mechanisms That Integrate Audiovisual Signals in Speech Comprehension , 2011, The Journal of Neuroscience.
[26] U. Noppeney,et al. Perceptual Decisions Formed by Accumulation of Audiovisual Evidence in Prefrontal Cortex , 2010, The Journal of Neuroscience.
[27] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Guido Rossum,et al. Python Reference Manual , 2000 .
[29] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.
[30] Iain D Gilchrist,et al. Perception of differences in naturalistic dynamic scenes, and a V1-based model. , 2015, Journal of vision.