First-Person Vision

For understanding the behavior, intent, and environment of a person, the surveillance metaphor is traditional; that is, install cameras and observe the subject, and his/her interaction with other people and the environment. Instead, we argue that the first-person vision (FPV), which senses the environment and the subject's activities from a wearable sensor, is more advantageous with images about the subject's environment as taken from his/her view points, and with readily available information about head motion and gaze through eye tracking. In this paper, we review key research challenges that need to be addressed to develop such FPV systems, and describe our ongoing work to address them using examples from our prototype systems.

[1]  L. Young,et al.  Survey of eye movement recording methods , 1975 .

[2]  M. Just,et al.  Eye fixations and cognitive processes , 1976, Cognitive Psychology.

[3]  Yiannis Aloimonos,et al.  Purposive and qualitative active vision , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[4]  Peter W. Hallinan Recognizing human eyes , 1991, Optics & Photonics.

[5]  Rajesh P. N. Rao,et al.  An Active Vision Architecture Based on Iconic Representations , 1995, Artif. Intell..

[6]  Alex Pentland,et al.  Visual contextual awareness in wearable computing , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[7]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Alex Pentland,et al.  An Interactive Computer Vision System DyPERS: Dynamic Personal Enhanced Reality System , 1999, ICVS.

[9]  Alex Pentland,et al.  Unsupervised clustering of ambulatory audio and video , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Alex Pentland,et al.  Realtime personal positioning system for a wearable computer , 1999, Digest of Papers. Third International Symposium on Wearable Computers.

[11]  Alex Pentland,et al.  Looking at People: Sensing for Ubiquitous and Wearable Computing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[13]  Shin'ichi Satoh,et al.  Distinctiveness-sensitive nearest-neighbor search for efficient similarity retrieval of multimedia information , 2001, Proceedings 17th International Conference on Data Engineering.

[14]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[15]  Andrew T Duchowski,et al.  A breadth-first survey of eye-tracking applications , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[16]  A. ADoefaa,et al.  ? ? ? ? f ? ? ? ? ? , 2003 .

[17]  Simon Lacroix,et al.  High resolution terrain mapping using low attitude aerial stereo imagery , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Albrecht Schmidt,et al.  Multi-sensor Activity Context Detection for Wearable Computing , 2003, EUSAI.

[19]  Tanveer F. Syeda-Mahmood,et al.  View-invariant alignment and matching of video sequences , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[21]  Anil K. Jain,et al.  Clustering with Soft and Group Constraints , 2004, SSPR/SPR.

[22]  Larry H. Matthies,et al.  Real-time detection of moving objects from moving vehicles using dense stereo and optical flow , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[23]  David W. Murray,et al.  Wearable hand activity recognition for event summarization , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[24]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[25]  Lars Kai Hansen,et al.  Deformable Models for Eye Tracking , 2005 .

[26]  Christopher M. Brown,et al.  Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[27]  Roberto Manduchi,et al.  Detection and Localization of Curbs and Stairways Using Stereo Vision , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[28]  Kurt Konolige,et al.  Real-Time Detection of Independent Motion using Stereo , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[29]  Uwe Franke,et al.  6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception , 2005, DAGM-Symposium.

[30]  Henry A. Kautz,et al.  Fine-grained activity recognition by aggregating abstract object usage , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[31]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[32]  Gaetano Borriello,et al.  A Practical Approach to Recognizing Physical Activities , 2006, Pervasive.

[33]  Dongheng Li,et al.  openEyes: a low-cost head-mounted eye-tracking solution , 2006, ETRA.

[34]  Bernt Schiele,et al.  Scalable Recognition of Daily Activities with Wearable Sensors , 2007, LoCA.

[35]  Matthai Philipose,et al.  Common Sense Based Joint Training of Human Activity Recognizers , 2007, IJCAI.

[36]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Michel Dhome,et al.  Body Mounted Vision System for Visually Impaired Outdoor and Indoor Wayfinding Assistance , 2007, CVHI.

[38]  Juan Manuel Saez Martinez,et al.  Stereo-based Aerial Obstacle Detection for the Visually Impaired , 2008 .

[39]  G. ÓLaighin,et al.  Direct measurement of human movement by accelerometry. , 2008, Medical engineering & physics.

[40]  A. Nee,et al.  Navigation systems for individuals with visual impairment: a survey , 2008 .

[41]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[42]  Paul Lukowicz,et al.  Wearable Activity Tracking in Car Manufacturing , 2008, IEEE Pervasive Computing.

[43]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[44]  Takeo Kanade,et al.  Image matching in large scale indoor environment , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[45]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Fernando De la Torre,et al.  Canonical Time Warping for Alignment of Human Behavior , 2009, NIPS.

[47]  Walterio W. Mayol-Cuevas,et al.  High level activity recognition using low resolution wearable vision , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[48]  Siddhartha S. Srinivasa,et al.  Inverse Optimal Heuristic Control for Imitation Learning , 2009, AISTATS.

[49]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[50]  Takeo Kanade,et al.  Image composition for object pop-out , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[51]  Matthai Philipose,et al.  Egocentric recognition of handled objects: Benchmark and analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[52]  Michael Beetz,et al.  EYEWATCHME—3D Hand and object tracking for inside out activity analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[53]  Luc Van Gool,et al.  Moving obstacle detection in highly dynamic scenes , 2009, 2009 IEEE International Conference on Robotics and Automation.

[54]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55]  Siddhartha S. Srinivasa,et al.  MOPED: A scalable and low latency object recognition and pose estimation system , 2010, 2010 IEEE International Conference on Robotics and Automation.

[56]  Siddhartha S. Srinivasa,et al.  Efficient multi-view object recognition and full pose estimation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[57]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[59]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[60]  Siddhartha S. Srinivasa,et al.  People helping robots helping people: Crowdsourcing for grasping novel objects , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[61]  Alvaro Collet,et al.  Making specific features less discriminative to improve point-based 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[62]  Byron Boots,et al.  Predictive State Temporal Difference Learning , 2010, NIPS.

[63]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[64]  Martial Hebert,et al.  Source constrained clustering , 2011, 2011 International Conference on Computer Vision.

[65]  Takeo Kanade,et al.  Image matching with distinctive visual vocabulary , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[66]  Stefan Carlsson,et al.  Novelty detection from an ego-centric perspective , 2011, CVPR 2011.

[67]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[68]  Takeo Kanade,et al.  A Head-Wearable Short-Baseline Stereo System for the Simultaneous Estimation of Structure and Motion , 2011, MVA.

[69]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[70]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[71]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[72]  Takeo Kanade,et al.  Illumination-free gaze estimation method for first-person vision wearable device , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[73]  Siddhartha S. Srinivasa,et al.  Structure discovery in multi-modal data: A region-based approach , 2011, 2011 IEEE International Conference on Robotics and Automation.

[74]  Takeo Kanade,et al.  Discovering object instances from scenes of Daily Living , 2011, 2011 International Conference on Computer Vision.

[75]  Aude Billard,et al.  A wearable gaze tracking system for children in unconstrained environments , 2011, Comput. Vis. Image Underst..

[76]  W. Marsden I and J , 2012 .

[77]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Alan Yuille,et al.  Active Vision , 2014, Computer Vision, A Reference Guide.

[79]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.