CNN Based Touch Interaction Detection for Infant Speech Development

In this paper, we investigate the detection of interaction in videos between two people, namely, a caregiver and an infant. We are interested in a particular type of human interaction known as touch, as touch is a key social and emotional signal used by caregivers when interacting with their children. We propose an automatic touch event recognition method to determine the potential time interval when the caregiver touches the infant. In addition to label the touch events, we also classify them into six touch types based on which body part of infant has been touched. CNN based human pose estimation and person segmentation are used to analyze the spatial relationship between the caregivers hands and the infants. We demonstrate promising results for touch detection and show great potential of reducing human effort in manually generating precise touch annotations.

[1]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[3]  Ruth Feldman,et al.  Touch attenuates infants' physiological reactivity to stress. , 2010, Developmental science.

[4]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[6]  J. Shepherd,et al.  Maternal touch and maternal child-directed speech: effects of depressed mood in the postnatal period. , 2004, Journal of affective disorders.

[7]  D. Stack,et al.  Functions of maternal touch and infants' affect during face-to-face interactions: new directions for the still-face. , 2009, Infant behavior & development.

[8]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Alan Fogel,et al.  A longitudinal investigation of maternal touching across the first 6 months of life: age and context effects. , 2009, Infant behavior & development.

[14]  E Anisfeld,et al.  Does infant carrying promote attachment? An experimental study of the effects of increased physical contact on the development of attachment. , 1990, Child development.

[15]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[16]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Hennie Brugman,et al.  Annotating Multi-media/Multi-modal Resources with ELAN , 2004, LREC.

[18]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[19]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  D. Muir,et al.  Tactile stimulation as a component of social interchange: New interpretations for the still‐face effect , 1990 .

[22]  R. Feldman,et al.  The development of maternal touch across the first year of life. , 2008, Early human development.

[23]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[24]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[25]  Katharina J. Rohlfing,et al.  Language Does Something: Body Action and Language in Maternal Input to Three-Month-Olds , 2011, IEEE Transactions on Autonomous Mental Development.

[26]  Alejandrina Cristia,et al.  Multimodal infant-directed communication: how caregivers combine tactile and linguistic cues* , 2016, Journal of Child Language.

[27]  D. Messinger,et al.  Cultural Differences in Physical Contact Between Hispanic and Anglo Mother–Infant Dyads Living in the United States , 1996 .

[28]  S. G. Ferber The nature of touch in mothers experiencing maternity blues: the contribution of parity. , 2004, Early human development.

[29]  D. Stack,et al.  Infant touching behaviour during mother–infant face‐to‐face interactions , 2007 .

[30]  D. Muir,et al.  Adult Communications with Infants through Touch: The Forgotten Sense , 2002, Human Development.

[31]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[32]  Matthew J. Hertenstein Touch: Its Communicative Functions in Infancy , 2002, Human Development.

[33]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[34]  D. Stack,et al.  Full-term and very-low-birth-weight preterm infants' self-regulating behaviors during a Still-Face interaction: influences of maternal touch. , 2012, Infant behavior & development.

[35]  D. Stack,et al.  Changes in mothers' touch and hand gestures influence infant behavior during face-to-face interchanges , 1998 .

[36]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[38]  Fengqing Zhu,et al.  Long Term Hand Tracking with Proposal Selection , 2018, 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[39]  Edward J. Delp,et al.  Touch Event Recognition For Human Interaction , 2016 .

[40]  Peter V. Gehler,et al.  Strong Appearance and Expressive Spatial Models for Human Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.