Deformable patch embedding-based shift module-enhanced transformer for panoramic action recognition
暂无分享,去创建一个
[1] Gunhee Kim,et al. Panoramic Vision Transformer for Saliency Detection in 360° Videos , 2022, ECCV.
[2] Ruize Han,et al. Panoramic Human Activity Recognition , 2022, ECCV.
[3] R. Stiefelhagen,et al. Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Guosheng Hu,et al. DPT: Deformable Patch-based Transformer for Visual Recognition , 2021, ACM Multimedia.
[5] Jakob Uszkoreit,et al. How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers , 2021, Trans. Mach. Learn. Res..
[6] Anima Anandkumar,et al. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.
[7] Ivan Marsic,et al. VidTr: Video Transformer Without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Matthieu Cord,et al. Going deeper with Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[10] Lihi Zelnik-Manor,et al. An Image is Worth 16x16 Words, What is a Video Worth? , 2021, ArXiv.
[11] Heng Wang,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.
[12] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[13] Limin Wang,et al. TDN: Temporal Difference Networks for Efficient Action Recognition , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Klaus Dietmayer,et al. Point Transformer , 2020, IEEE Access.
[15] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[16] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[17] Hugo Latapie,et al. Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset , 2020, 2020 IEEE International Conference on Image Processing (ICIP).
[18] Baoxin Li,et al. Unsupervised Learning of Optical Flow With CNN-Based Non-Local Filtering , 2020, IEEE Transactions on Image Processing.
[19] Willem Zuidema,et al. Quantifying Attention Flow in Transformers , 2020, ACL.
[20] Junnan Li,et al. Weakly-Supervised Multi-Person Action Recognition in 360° Videos , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[21] Jan-Michael Frahm,et al. Tangent Images for Mitigating Spherical Distortion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] O. Lanz,et al. Gate-Shift Networks for Video Action Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Matthias Nießner,et al. Spherical CNNs on Unstructured Grids , 2019, ICLR.
[24] Baoxin Li,et al. Action-Stage Emphasized Spatiotemporal VLAD for Video Action Recognition , 2019, IEEE Transactions on Image Processing.
[25] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Sergio Escalera,et al. LSTA: Long Short-Term Attention for Egocentric Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Kuk-jin Yoon,et al. SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360° Images , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[30] Rafael Monroy,et al. SalNet360: Saliency Maps for omni-directional images with CNN , 2017, Signal Process. Image Commun..
[31] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[32] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Limin Wang,et al. Temporal Segment Networks for Action Recognition in Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[34] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[36] Thomas Brox,et al. High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.
[37] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.
[38] Zehdreh Allen-Lafayette,et al. Flattening the Earth, Two Thousand Years of Map Projections , 1998 .
[39] Han Zhang,et al. Visual Indoor Navigation Using Mobile Augmented Reality , 2022, CGI.
[40] Connelly Barnes,et al. Deep 360° Optical Flow Estimation Based on Multi-Projection Fusion , 2022, ArXiv.
[41] Wuzhen Shi,et al. Partially Occluded Skeleton Action Recognition Based on Multi-stream Fusion Graph Convolutional Networks , 2021, CGI.
[42] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[43] Zhaoxin Li,et al. Monocular Dense SLAM with Consistent Deep Depth Prediction , 2021, CGI.