withyou—An Experimental End-to-End Telepresence System Using Video-Based Reconstruction

Supporting a wide set of linked non-verbal resources remains an evergreen challenge for communication technology, limiting effectiveness in many applications. Interpersonal distance, gaze, posture and facial expression, are interpreted together to manage and add meaning to most conversations. Yet today's technologies favor some above others. This induces confusion in conversations, and is believed to limit both feelings of togetherness and trust, and growth of empathy and rapport. Solving this problem will allow technologies to support most rather than a few interactional scenarios. It is likely to benefit teamwork and team cohesion, distributed decision-making and health and wellbeing applications such as tele-therapy, tele-consultation, and isolation. We introduce withyou, our telepresence research platform. This paper describes the end-to-end system including the psychology of human interaction and how this drives requirements throughout the design and implementation. Our technology approach is to combine the winning characteristics of video conferencing and immersive collaborative virtual environments. This is to allow, for example, people walking past each other to exchange a glance and smile. A systematic explanation of the theory brings together the linked nature of non-verbal communication and how it is influenced by technology. This leads to functional requirements for telepresence, in terms of the balance of visual, spatial and temporal qualities. The first end-to-end description of withyou describes all major processes and the display and capture environment. An unprecedented characterization of our approach is given in terms of the above qualities and what influences them. This leads to non-functional requirements in terms of number and place of cameras and the avoidance of resultant bottlenecks. Proposals are given for improved distribution of processes across networks, computers, and multi-core CPU and GPU. Simple conservative estimation shows that both approaches should meet our requirements. One is implemented and shown to meet minimum and come close to desirable requirements.

[1]  S. Anstis,et al.  The perception of where a face or television "portrait" is looking. , 1969, The American journal of psychology.

[2]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Robin Wolff,et al.  Communicating Eye Gaze across a Distance without Rooting Participants to the Spot , 2008, 2008 12th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications.

[4]  Wijnand A. IJsselsteijn,et al.  Human sensitivity to eye contact in 2D and 3D videoconferencing , 2010, 2010 Second International Workshop on Quality of Multimedia Experience (QoMEX).

[5]  P. Ekman Facial expression and emotion. , 1993, The American psychologist.

[6]  Adrian Hilton,et al.  A Free-Viewpoint Video System for Visualization of Sport Scenes , 2007 .

[7]  D. Roberts,et al.  Reducing fragmentation in telecollaboration by using IPT interfaces , 2005, EGVE'05.

[8]  Peter Eisert,et al.  Model based 3D gaze estimation for provision of virtual eye contact , 2012, 2012 19th IEEE International Conference on Image Processing.

[9]  Mattias Heldner,et al.  Pauses, gaps and overlaps in conversations , 2010, J. Phonetics.

[10]  Luc Van Gool,et al.  Blue-c: a spatially immersive display and 3D video portal for telepresence , 2003, IPT/EGVE.

[11]  Oliver Otto,et al.  Constructing a Gazebo: Supporting Teamwork in a Tightly Coupled, Distributed Task in Virtual Reality , 2003, Presence: Teleoperators & Virtual Environments.

[12]  M. Patterson An arousal model of interpersonal intimacy. , 1976 .

[13]  Meenakshisundaram Gopi,et al.  Surface Reconstruction based on Lower Dimensional Localized Delaunay Triangulation , 2000, Comput. Graph. Forum.

[14]  Valentin Kulyk,et al.  Subjective quality assessment of video conferences and telemeetings , 2012, 2012 19th International Packet Video Workshop (PV).

[15]  Greg Welch,et al.  Animatronic shader lamps avatars , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[16]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[17]  David J. Roberts,et al.  Parallel processing for real-time 3D reconstruction from video streams , 2012, Journal of Real-Time Image Processing.

[18]  Oliver Otto,et al.  A review of telecollaboration technologies with respect to closely coupled collaboration , 2007, Int. J. Comput. Appl. Technol..

[19]  Bruce G. Baumgart A polyhedron representation for computer vision , 1975, AFIPS '75.

[20]  Jeremy N. Bailenson,et al.  Equilibrium Theory Revisited: Mutual Gaze and Personal Space in Virtual Environments , 2001, Presence: Teleoperators & Virtual Environments.

[21]  Ilona Heldal,et al.  Collaborating in networked immersive spaces: as good as being there together? , 2001, Comput. Graph..

[22]  Thomas Malzbender,et al.  Understanding performance in coliseum, an immersive videoconferencing system , 2005, TOMCCAP.

[23]  Ilona Heldal,et al.  Factors influencing flow of object focussed collaboration in collaborative virtual environments , 2006, Virtual Reality.

[24]  Chris Barker,et al.  An Experiment on Public Speaking Anxiety in Response to Three Different Types of Virtual Audience , 2002, Presence: Teleoperators & Virtual Environments.

[25]  Ingo Feldmann,et al.  Towards 3 D-Aware Telepresence : Working on Technologies Behind the Scene , 2010 .

[26]  Anthony Steed,et al.  Lie tracking: social presence, truth and deception in avatar-mediated telecommunication , 2010, CHI.

[27]  Oliver Schreer,et al.  Real-time patch sweeping for high-quality depth estimation in 3D video conferencing applications , 2011, Electronic Imaging.

[28]  Roel Vertegaal,et al.  TeleHuman: effects of 3d perspective on gaze and pose estimation with a life-size cylindrical telepresence pod , 2012, CHI.

[29]  M. Argyle,et al.  EYE-CONTACT, DISTANCE AND AFFILIATION. , 1965, Sociometry.

[30]  S. Porter,et al.  Reading Between the Lies , 2008, Psychological science.

[31]  Rob Aspin,et al.  Synchronization of Images from Multiple Cameras to Reconstruct a Moving Human , 2010, 2010 IEEE/ACM 14th International Symposium on Distributed Simulation and Real Time Applications.

[32]  Robin Wolff,et al.  Eye Tracking for Avatar Eye Gaze Control During Object-Focused Multiparty Interaction in Immersive Collaborative Virtual Environments , 2009, 2009 IEEE Virtual Reality Conference.

[33]  Robin Wolff,et al.  Collaborative telepresence workspaces for space operation and science , 2015, 2015 IEEE Virtual Reality (VR).

[34]  R. Zajonc Feeling and thinking : Preferences need no inferences , 1980 .

[35]  Dave Roberts,et al.  Eye gaze in virtual environments: evaluating the need and initial work on implementation , 2009, Concurr. Comput. Pract. Exp..

[36]  Massimo Bergamasco,et al.  Beaming: An Asymmetric Telepresence System , 2012, IEEE Computer Graphics and Applications.

[37]  Rob Aspin,et al.  A GPU based, projective multi-texturing approach to reconstructing the 3D human form for application in tele-presence , 2011, CSCW '11.

[38]  Jérémie Allard,et al.  The GrImage Platform: A Mixed Reality Environment for Interactions , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[39]  Oliver Otto,et al.  A Study of Event Traffic During the Shared Manipulation of Objects Within a Collaborative Virtual Environment , 2004, Presence: Teleoperators & Virtual Environments.

[40]  David J. Roberts,et al.  Maximising concurrency and scalability in a consistent, causal, distributed virtual reality system, whilst minimising the effect of network delays , 1997, Proceedings of IEEE 6th Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises.

[41]  Edmond Boyer,et al.  Efficient Polyhedral Modeling from Silhouettes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  E. Hall,et al.  The Hidden Dimension , 1970 .

[43]  Kostas Daniilidis,et al.  Tele-immersion Portal : Towards an Ultimate Synthesis of Computer Graphics and Computer Vision Systems , 2001 .

[44]  Rob Aspin,et al.  Estimating the Gaze of a Virtuality Human , 2013, IEEE Transactions on Visualization and Computer Graphics.

[45]  Greg Welch,et al.  The office of the future: a unified approach to image-based modeling and spatially immersive displays , 1998, SIGGRAPH.

[46]  Robin Wolff,et al.  Communicating Eye-gaze Across a Distance: Comparing an Eye-gaze enabled Immersive Collaborative Virtual Environment, Aligned Video Conferencing, and Being Together , 2009, 2009 IEEE Virtual Reality Conference.

[47]  David J. Roberts,et al.  Camera Image Synchronisation in Multiple Camera Real-Time 3D Reconstruction of Moving Humans , 2011, 2011 IEEE/ACM 15th International Symposium on Distributed Simulation and Real Time Applications.