论文信息 - Visual Natural Language Query Auto-Completion for Estimating Instance Probabilities

Visual Natural Language Query Auto-Completion for Estimating Instance Probabilities

We present a new task of query auto-completion for estimating instance probabilities. We complete a user query prefix conditioned upon an image. Given the complete query, we fine tune a BERT embedding for estimating probabilities of a broad set of instances. The resulting instance probabilities are used for selection while being agnostic to the segmentation or attention mechanism. Our results demonstrate that auto-completion using both language and vision performs better than using only language, and that fine tuning a BERT embedding allows to efficiently rank instances in the image. In the spirit of reproducible research we make our data, models, and code available.

Fan Wu | Iddo Drori | Jin Yan | Samuel Sharpe

[1] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[2] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Trevor Darrell,et al. Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Mari Ostendorf,et al. Personalized Language Model for Query Auto-Completion , 2018, ACL.

[5] Trevor Darrell,et al. Segmentation from Natural Language Expressions , 2016, ECCV.

[6] Hongliang Li,et al. Key-Word-Aware Network for Referring Expression Image Segmentation , 2018, ECCV.

[7] Mari Ostendorf,et al. Low-Rank RNN Adaptation for Context-Aware Language Modeling , 2017, TACL.

[8] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.