LeSSS: Learned Shared Semantic Spaces for Relating Multi‐Modal Representations of 3D Shapes

In this paper, we propose a new method for structuring multi‐modal representations of shapes according to semantic relations. We learn a metric that links semantically similar objects represented in different modalities. First, 3D‐shapes are associated with textual labels by learning how textual attributes are related to the observed geometry. Correlations between similar labels are captured by simultaneously embedding labels and shape descriptors into a common latent space in which an inner product corresponds to similarity. The mapping is learned robustly by optimizing a rank‐based loss function under a sparseness prior for the spectrum of the matrix of all classifiers. Second, we extend this framework towards relating multi‐modal representations of the geometric objects. The key idea is that weak cues from shared human labels are sufficient to obtain a consistent embedding of related objects even though their representations are not directly comparable. We evaluate our method against common base‐line approaches, investigate the influence of different geometric descriptors, and demonstrate a prototypical multi‐modal browser that relates 3D‐objects with text, photographs, and 2D line sketches.

[1]  Mathieu Aubry,et al.  Painting-to-3D model alignment via discriminative visual elements , 2014, TOGS.

[2]  Paul Suetens,et al.  meshSIFT: Local surface features for 3D face recognition under expression variations and partial data , 2013, Comput. Vis. Image Underst..

[3]  Ali Farhadi,et al.  Scene Discovery by Matrix Factorization , 2008, ECCV.

[4]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[5]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Federico Tombari,et al.  SHOT: Unique signatures of histograms for surface and texture description , 2014, Comput. Vis. Image Underst..

[7]  Remco C. Veltkamp,et al.  A survey of content based 3D shape retrieval methods , 2004, Proceedings Shape Modeling Applications, 2004..

[8]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[9]  Hans-Peter Seidel,et al.  Scalable Symmetry Detection for Urban Scenes , 2013, Comput. Graph. Forum.

[10]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ali Farhadi,et al.  Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Igor Guskov,et al.  3D object recognition from range images using pyramid matching , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  R. Horaud,et al.  Surface feature detection and description with applications to mesh matching , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Aaron Hertzmann,et al.  Learning 3D mesh segmentation and labeling , 2010, ACM Trans. Graph..

[16]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[17]  Jason Weston,et al.  Multi-Tasking with Joint Semantic Spaces for Large-Scale Music Annotation and Retrieval , 2011 .

[18]  Ali Farhadi,et al.  Unlabeled data improvesword prediction , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[20]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[21]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[22]  Rui Ma,et al.  Organizing heterogeneous scene collections through contextual focal points , 2014, ACM Trans. Graph..

[23]  Ghassan Hamarneh,et al.  A Survey on Shape Correspondence , 2011, Comput. Graph. Forum.

[24]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[25]  Marc Pouget,et al.  Estimating differential quantities using polynomial fitting of osculating jets , 2003, Comput. Aided Geom. Des..

[26]  Shi-Min Hu,et al.  Qualitative organization of collections of shapes via quartet analysis , 2013, ACM Trans. Graph..

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Alexander M. Bronstein,et al.  Supervised learning of bag‐of‐features shape descriptors using sparse coding , 2014, Comput. Graph. Forum.

[29]  Peter H. N. de With,et al.  Fast Training of Object Detection Using Stochastic Gradient Descent , 2010, 2010 20th International Conference on Pattern Recognition.

[30]  Jason Weston,et al.  Large-Scale Music Annotation and Retrieval: Learning to Rank in Joint Semantic Spaces , 2011, ArXiv.

[31]  Rynson W. H. Lau,et al.  Data-driven segmentation and labeling of freehand sketches , 2014, ACM Trans. Graph..

[32]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[33]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[34]  Leonidas J. Guibas,et al.  Probabilistic fingerprints for shapes , 2006, SGP '06.

[35]  Marc Alexa,et al.  Sketch-based shape retrieval , 2012, ACM Trans. Graph..

[36]  Paul Suetens,et al.  Feature detection on 3D face surfaces for pose normalisation and recognition , 2010, 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[37]  Daniel Cohen-Or,et al.  Eurographics Symposium on Geometry Processing 2013 Dynamic Maps for Exploring and Browsing Shapes , 2022 .

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Ali Farhadi,et al.  Unlabeled Data Improves Word Prediction , 2009 .

[40]  Arif Mahmood,et al.  HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition , 2014, ECCV.

[41]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[42]  Hans-Peter Seidel,et al.  An efficient construction of reduced deformable objects , 2013, ACM Trans. Graph..

[43]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[44]  Tinne Tuytelaars,et al.  Sketch classification and classification-driven analysis using Fisher vectors , 2014, ACM Trans. Graph..

[45]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[46]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.

[47]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..