暂无分享,去创建一个
Songyang Zhang | Jiebo Luo | Zhengyuan Yang | Liwei Wang | Liwei Wang | Songyang Zhang | Jiebo Luo | Zhengyuan Yang
[1] Shin'ichi Satoh,et al. Faster R-CNN Features for Instance Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[2] Jiebo Luo,et al. Grounding-Tracking-Integration , 2019, ArXiv.
[3] Bernard Ghanem,et al. 2D-Driven 3D Object Detection in RGB-D Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[4] Danfei Xu,et al. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Kan Chen,et al. Video Object Grounding Using Semantic Roles in Language Description , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Liang Zhang,et al. Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Ahmed Abdelreheem,et al. ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes , 2020, ECCV.
[9] Steven Lake Waslander,et al. Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[10] Leonidas J. Guibas,et al. ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Jitendra Malik,et al. Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[13] Vivek Mittal. AttnGrounder: Talking to Cars with Attention , 2020, ECCV Workshops.
[14] Jiebo Luo,et al. TAP: Text-Aware Pre-training for Text-VQA and Text-Caption , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Silvio Savarese,et al. DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[17] Leonidas J. Guibas,et al. Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Jiebo Luo,et al. A Fast and Accurate One-Stage Approach to Visual Grounding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Trevor Darrell,et al. Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Bernard C. Kress,et al. 11‐1: Invited Paper: Towards the Ultimate Mixed Reality Experience: HoloLens Display Architecture Choices , 2017 .
[22] Junyu Luo,et al. TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding , 2021, ACM Multimedia.
[23] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[24] Licheng Yu,et al. MAttNet: Modular Attention Network for Referring Expression Comprehension , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[25] Abhinav Gupta,et al. Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Alexei A. Efros,et al. Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Kan Chen,et al. Zero-Shot Grounding of Objects From Natural Language Queries , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Hwann-Tzong Chen,et al. Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation , 2021, AAAI.
[29] Yang Zhao,et al. Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[31] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[32] Yun Fu,et al. Visual Semantic Reasoning for Image-Text Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Angel X. Chang,et al. Scan2Cap: Context-aware Dense Captioning in RGB-D Scans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[35] Liwei Wang,et al. Learning Two-Branch Neural Networks for Image-Text Matching Tasks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[36] Silvio Savarese,et al. 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[37] Li Jiang,et al. PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Stan Sclaroff,et al. CityFlow-NL: Tracking and Retrieval of Vehicles at City Scale by Natural Language Descriptions , 2021, ArXiv.
[39] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[40] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[41] Jianxiong Xiao,et al. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Wengang Zhou,et al. TransVG: End-to-End Visual Grounding with Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[43] Ruimao Zhang,et al. InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[44] Matthias Niessner,et al. ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language , 2020, ECCV.
[45] Matthias Nießner,et al. 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.
[47] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[48] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Kangsoo Kim,et al. Revisiting Trends in Augmented Reality Research: A Review of the 2nd Decade of ISMAR (2008–2017) , 2018, IEEE Transactions on Visualization and Computer Graphics.
[50] Federico Tombari,et al. Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[52] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Jiebo Luo,et al. Improving One-stage Visual Grounding by Recursive Sub-query Construction , 2020, ECCV.
[54] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[55] Ali Farhadi,et al. LanguageRefer: Spatial-Language Model for 3D Visual Grounding , 2021, CoRL.
[56] Leonidas J. Guibas,et al. Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.