Seeing and Hearing Egocentric Actions: How Much Can We Learn?
暂无分享,去创建一个
Alejandro Cartas | Jordi Luque | Petia Radeva | Carlos Segura | Mariella Dimiccoli | P. Radeva | J. Luque | Mariella Dimiccoli | C. Segura | Alejandro Cartas
[1] Dawei Liang,et al. Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..
[2] Alejandro Cartas,et al. How Much Does Audio Matter to Recognize Egocentric Object Interactions? , 2019, ArXiv.
[3] Marcel van Gerven,et al. Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition , 2016, ECCV Workshops.
[4] C.-C. Jay Kuo,et al. Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..
[5] Nasser Kehtarnavaz,et al. A survey of depth and inertial sensor fusion for human action recognition , 2015, Multimedia Tools and Applications.
[6] Bernt Schiele,et al. A tutorial on human activity recognition using body-worn inertial sensors , 2014, CSUR.
[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[8] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Sergio Escalera,et al. LSTA: Long Short-Term Attention for Egocentric Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Michael Wagner,et al. Multimodal Depression Detection: Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors , 2018, IEEE Transactions on Affective Computing.
[11] Bernard Ghanem,et al. ActivityNet Challenge 2017 Summary , 2017, ArXiv.
[12] Heng Wang,et al. Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[14] Ronald Poppe,et al. A survey on vision-based human action recognition , 2010, Image Vis. Comput..
[15] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Kris M. Kitani,et al. Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Mehrtash Tafazzoli Harandi,et al. Going deeper into action recognition: A survey , 2016, Image Vis. Comput..
[18] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[20] Chong Li,et al. A Hierarchical Deep Fusion Framework for Egocentric Activity Recognition using a Wearable Hybrid Sensor System , 2019, Sensors.
[21] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[22] H. Bülthoff,et al. Merging the senses into a robust percept , 2004, Trends in Cognitive Sciences.
[23] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Jianguo Zhang,et al. Multimodal Egocentric Analysis of Focused Interactions , 2018, IEEE Access.
[26] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[27] Oswald Lanz,et al. Multi-Speaker Tracking From an Audio–Visual Sensing Device , 2019, IEEE Transactions on Multimedia.
[28] Hedvig Kjellström,et al. Audio-visual classification and detection of human manipulation actions , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[29] Xiao Liu,et al. Multimodal Keyless Attention Fusion for Video Classification , 2018, AAAI.
[30] James M. Rehg,et al. Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.
[31] Bernard Ghanem,et al. The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary , 2018, ArXiv.
[32] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[33] Yi Yang,et al. Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019 , 2019, ArXiv.
[34] Ali Farhadi,et al. Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.
[35] Tara N. Sainath,et al. Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.
[36] Deva Ramanan,et al. Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[37] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Soo-Young Lee,et al. Environmental audio scene and activity recognition through mobile-based crowdsourcing , 2012, IEEE Transactions on Consumer Electronics.
[39] Horst Bischof,et al. A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.
[40] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Christian Wolf,et al. Object Level Visual Reasoning in Videos , 2018, ECCV.
[42] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.
[44] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[45] Qin Jin,et al. Video Description Generation using Audio and Visual Cues , 2016, ICMR.
[46] C. V. Jawahar,et al. First Person Action Recognition Using Deep Learned Descriptors , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Kristen Grauman,et al. Object-Centric Spatio-Temporal Pyramids for Egocentric Activity Recognition , 2013, BMVC.
[48] Sergio Escalera,et al. FBK-HUPBA Submission to the EPIC-Kitchens 2019 Action Recognition Challenge , 2019, ArXiv.
[49] Daniel Roggen,et al. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.
[50] Xiao Liu,et al. Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[51] Omesh Tickoo,et al. Uncertainty-Aware Audiovisual Activity Recognition Using Deep Bayesian Variational Inference , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[52] Mohan M. Trivedi,et al. Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[53] Manuele Bicego,et al. Audio-Visual Event Recognition in Surveillance Video Sequences , 2007, IEEE Transactions on Multimedia.
[54] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[55] Oswald Lanz,et al. Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition , 2018, BMVC.
[56] Alex Pentland,et al. Human computing and machine understanding of human behavior: a survey , 2006, ICMI '06.