Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

People can refer to quantities in a visual scene by using either exact cardinals (e.g. one, two, three) or natural language quantifiers (e.g. few, most, all). In humans, these two processes underlie fairly different cognitive and neural mechanisms. Inspired by this evidence, the present study proposes two models for learning the objective meaning of cardinals and quantifiers from visual scenes containing multiple objects. We show that a model capitalizing on a ‘fuzzy’ measure of similarity is effective for learning quantifiers, whereas the learning of exact cardinals is better accomplished when information about number is provided.

[1]  K. Wynn Children's acquisition of the number words and the counting system , 1992, Cognitive Psychology.

[2]  S. Gelman,et al.  Six does not just mean a lot: preschoolers see number words as specific , 2004, Cognition.

[3]  L. Gleitman,et al.  Asymmetries in the Acquisition of Numbers and Quantifiers , 2006 .

[4]  Justin Halberda,et al.  The Development of “Most” Comprehension and Its Potential Dependence on Counting Ability in Preschoolers , 2008 .

[5]  Jeffrey Lidz,et al.  The Meaning of ‘Most’: Semantics, Numerosity and Psychology , 2009 .

[6]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  D. Barner,et al.  Cross-linguistic relations between quantifiers and numerals in language acquisition: evidence from Japanese. , 2009, Journal of experimental child psychology.

[8]  D. Melcher,et al.  Subitizing reflects visuo-spatial object individuation capacity , 2011, Cognition.

[9]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[10]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[11]  Tao Yang,et al.  Dissociated neural correlates of quantity processing of quantifiers, numbers, and numerosities , 2014, Human brain mapping.

[12]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[14]  Margrit Betke,et al.  Salient Object Subitizing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[16]  Jordi Vitrià,et al.  Learning to count with deep object features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Henri Pesonen,et al.  Rapid and accurate processing of multiple objects in briefly presented scenes. , 2016, Journal of vision.

[20]  Sandro Pezzelle,et al.  “Look, some Green Circles!”: Learning to Quantify from Images , 2016, VL@ACL.

[21]  A. Nieder The neuronal code for number , 2016, Nature Reviews Neuroscience.

[22]  Napoleon Katsos,et al.  Cross-linguistic patterns in the acquisition of quantifiers , 2016, Proceedings of the National Academy of Sciences.

[23]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ramprasaath R. Selvaraju,et al.  Counting Everyday Objects in Everyday Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).