Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language
暂无分享,去创建一个
[1] Yong Zhang,et al. T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations , 2023, ArXiv.
[2] Yang Yang,et al. Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition , 2022, IEEE Transactions on Circuits and Systems for Video Technology.
[3] Sungjoon Choi,et al. Learning Joint Representation of Human Motion and Language , 2022, ArXiv.
[4] Dahua Lin,et al. DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition , 2022, ArXiv.
[5] Amit H. Bermano,et al. Human Motion Diffusion Model , 2022, ICLR.
[6] Zhongang Cai,et al. MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7] M. Dixit,et al. A comprehensive survey on human pose estimation approaches , 2022, Multimedia Systems.
[8] Marcella Cornia,et al. ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval , 2022, CBMI.
[9] Lisheng Wang,et al. Animating Images to Transfer CLIP for Video-Text Retrieval , 2022, SIGIR.
[10] Sen Wang,et al. TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts , 2022, ECCV.
[11] Andrea Esuli,et al. Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching , 2022, ArXiv.
[12] Sen Wang,et al. Generating Diverse and Natural 3D Human Motions from Text , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Yi Yang,et al. CenterCLIP: Token Clustering for Efficient Text-Video Retrieval , 2022, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[14] James R. Glass,et al. Everything at Once – Multi-modal Fusion Transformer for Video Retrieval , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Pavel Zezula,et al. Efficient Indexing of 3D Human Motions , 2021, ICMR.
[16] Liang Lin,et al. Hierarchical Transformer: Unsupervised Representation Learning for Skeleton-Based Human Action Recognition , 2021, 2021 IEEE International Conference on Multimedia and Expo (ICME).
[17] Guoying Zhao,et al. Tripool: Graph triplet pooling for 3D skeleton-based action recognition , 2021, Pattern Recognit..
[18] Pengfei Xiong,et al. CLIP2Video: Mastering Video-Text Retrieval via Image CLIP , 2021, ArXiv.
[19] Ying Wang,et al. Deep Hashing for Motion Capture Data Retrieval , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Claudio Gennaro,et al. Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features , 2021, 2021 International Conference on Content-Based Multimedia Indexing (CBMI).
[21] Nan Duan,et al. CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval , 2021, Neurocomputing.
[22] Michael J. Black,et al. Action-Conditioned 3D Human Motion Synthesis with Transformer VAE , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] C. Theobalt,et al. Synthesis of Compositional Animations from Textual Descriptions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Liang Lin,et al. Motion-transformer: self-supervised pre-training for skeleton-based action recognition , 2021, MMAsia.
[26] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[27] Wenhan Yang,et al. MS2L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition , 2020, ACM Multimedia.
[28] Christopher D. Manning,et al. Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.
[29] Andrea Esuli,et al. Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders , 2020, ACM Trans. Multim. Comput. Commun. Appl..
[30] Shihao Zou,et al. Action2Motion: Conditioned Generation of 3D Human Motions , 2020, ACM Multimedia.
[31] Andrea Esuli,et al. Transformer Reasoning Network for Image- Text Matching and Retrieval , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).
[32] Pavel Zezula,et al. Motion Words: A Text-Like Representation of 3D Skeleton Sequences , 2020, ECIR.
[33] Pavel Zezula,et al. LSTM-based real-time action detection and prediction in human motion streams , 2019, Multimedia Tools and Applications.
[34] Bjorn Ottersten,et al. Two-Stage RGB-Based Action Detection Using Augmented 3D Poses , 2019, CAIP.
[35] Nikolaus F. Troje,et al. AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[36] Wenjun Zeng,et al. Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection , 2018, IEEE Transactions on Image Processing.
[37] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[38] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[39] Tamim Asfour,et al. The KIT Motion-Language Dataset , 2016, Big Data.
[40] Andrea Esuli,et al. Picture it in your mind: generating high level visual representations from textual descriptions , 2016, Information Retrieval Journal.
[41] Stefan Ulbrich,et al. Master Motor Map (MMM) — Framework and toolkit for capturing, representing, and reproducing human motion on humanoid robots , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.
[42] Tobias Schreck,et al. MotionExplorer: Exploratory Search in Human Motion Capture Data Based on Hierarchical Aggregation , 2013, IEEE Transactions on Visualization and Computer Graphics.
[43] Norman I. Badler,et al. Efficient motion retrieval in large motion databases , 2013, I3D '13.
[44] Atsushi Nakazawa,et al. A puppet interface for retrieval of motion capture data , 2011, SCA '11.
[45] Zhigang Deng,et al. Perceptually consistent example-based human motion retrieval , 2009, I3D '09.
[46] G. Amato,et al. SegmentCodeList: Unsupervised Representation Learning for Human Skeleton Data Retrieval , 2023, European Conference on Information Retrieval.
[47] Pavel Zezula,et al. Content-Based Management of Human Motion Data: Survey and Challenges , 2021, IEEE Access.
[48] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[49] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .