FashionAsk: A Multimedia based Question-Answering System

We demonstrate a multimedia-based question-answering system, named FashionAsk, by allowing users to ask questions referring to pictures snapped by mobile devices. Instead of asking verbose questions to depict visual instances, direct pictures are provided as part of the question. The signicance of our system is that (1) our system is fully automatic such that no human expert is involved; (2) and we bypass the requirement of the name for the object under query, which is mostly unknown to the asker. To answer these multi-modal questions, FashionAsk performs a large-scale instance search and metadata pooling to name the instance under query, and then matches with similar questions from communitycontributed QA websites as answers. This demonstration is conducted on a million-scale dataset of Web images and QA pairs in the domain of fashion products. Asking a multimedia question through FashionAsk can take as short as ve seconds to retrieve the candidate answer as well as suggested questions.

[1]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[2]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[3]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[4]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Trevor Darrell,et al.  Photo-based question answering , 2008, ACM Multimedia.

[8]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Chong-Wah Ngo,et al.  Searching visual instances with topology checking and context modeling , 2013, ICMR.

[10]  Meng Wang,et al.  Multimedia answering: enriching text QA with media information , 2011, SIGIR.