论文信息 - What looks good with my sofa: Multimodal search engine for interior design

What looks good with my sofa: Multimodal search engine for interior design

In this paper, we propose a multi-modal search engine for interior design that combines visual and textual queries. The goal of our engine is to retrieve interior objects, e.g. furniture or wall clocks, that share visual and aesthetic similarities with the query. Our search engine allows the user to take a photo of a room and retrieve with a high recall a list of items identical or visually similar to those present in the photo. Additionally, it allows to return other items that aesthetically and stylistically fit well together. To achieve this goal, our system blends the results obtained using textual and visual modalities. Thanks to this blending strategy, we increase the average style similarity score of the retrieved items by 11%. Our work is implemented as a Web-based application and it is planned to be opened to the public.

[1] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3] Daniel Cohen-Or,et al. Co-hierarchical analysis of shape structures , 2013, ACM Trans. Graph..

[4] Stefanie Seiler,et al. Elements Of Style , 2016 .

[5] Zellig S. Harris,et al. Distributional Structure , 1954 .

[6] Levent Burak Kara,et al. Co-constrained handles for deformation in shape collections , 2014, ACM Trans. Graph..

[7] Jeff Donahue,et al. Visual Search at Pinterest , 2015, KDD.

[8] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[10] Kavita Bala,et al. Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[11] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Eric Fernie,et al. Art History and Its Methods: A Critical Anthology , 1995 .

[16] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Tomasz Trzcinski,et al. Shallow Reading with Deep Learning: Predicting Popularity of Online Content Using only Its Title , 2017, ISMIS.

[19] C. Ding. A similarity-based probability model for latent semantic indexing , 1999, SIGIR '99.

[20] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21] Florent Perronnin,et al. Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[24] Ronan Sicre,et al. Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[25] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.

[26] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27] Andrew Zisserman,et al. Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[28] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[29] Paul D. White. Elements of Style , 1920 .

[30] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[32] Richard A. Harshman,et al. Information Retrieval using a Singular Value Decomposition Model of Latent Semantic Structure , 1988, SIGIR Forum.