Unsupervised understanding of location and illumination changes in egocentric videos

Wearable cameras stand out as one of the most promising devices for the upcoming years, and as a consequence, the demand of computer algorithms to automatically understand the videos recorded with them is increasing quickly. An automatic understanding of these videos is not an easy task, and its mobile nature implies important challenges to be faced, such as the changing light conditions and the unrestricted locations recorded. This paper proposes an unsupervised strategy based on global features and manifold learning to endow wearable cameras with contextual information regarding the light conditions and the location captured. Results show that non-linear manifold methods can capture contextual patterns from global features without compromising large computational resources. The proposed strategy is used, as an application case, as a switching mechanism to improve the hand-detection problem in egocentric videos.

[1]  Carlo S. Regazzoni,et al.  People Count Estimation In Small Crowds , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[2]  Li Jing,et al.  Selection of the Suitable Parameter Value for ISOMAP , 2011, J. Softw..

[3]  Jian Lu,et al.  Recognizing multi-user activities using wearable sensors in a smart home , 2011, Pervasive Mob. Comput..

[4]  Klaus Pawelzik,et al.  Quantifying the neighborhood preservation of self-organizing feature maps , 1992, IEEE Trans. Neural Networks.

[5]  Tino Lourens,et al.  Event based self-supervised temporal integration for multimodal sensor data. , 2005, Journal of integrative neuroscience.

[6]  Noureddine Ellouze,et al.  On the Search of Organization Measures for a Kohonen Map Case Study: Speech Signal Recognition , 2010, J. Digit. Content Technol. its Appl..

[7]  Zhou Zimu,et al.  RSSIからCSIへ:チャネルレスポンスによるインドア・ローカリゼーション , 2013 .

[8]  Cheng Li,et al.  Model Recommendation with Virtual Probes for Egocentric Hand Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Paola Baldassarri,et al.  Self-Organizing Maps versus Growing Neural Gas in a Robotic Application , 2009, IWANN.

[10]  Claudio Bettini,et al.  COSAR: hybrid reasoning for context-aware activity recognition , 2011, Personal and Ubiquitous Computing.

[11]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[12]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[13]  Guojin Zhu,et al.  The Growing Self-organizing Map for Clustering Algorithms in Programming Codes , 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence.

[14]  Emilia I. Barakova,et al.  Efficient episode encoding for spatial navigation , 2005, Int. J. Syst. Sci..

[15]  Manuel P. Cuéllar,et al.  A survey on ontologies for human behavior recognition , 2014, ACM Comput. Surv..

[16]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[17]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[18]  Sei Naito,et al.  An Attention-Based Activity Recognition for Egocentric Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Alejandro Betancourt,et al.  A Sequential Classifier for Hand Detection in the Framework of Egocentric Vision , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[20]  Antonio Torralba,et al.  Object Detection and Localization Using Local and Global Features , 2006, Toward Category-Level Object Recognition.

[21]  L. Ryd,et al.  On bias. , 1994, Acta orthopaedica Scandinavica.

[22]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[23]  B Fritzke,et al.  A growing neural gas network learns topologies. G. Tesauro, DS Touretzky, and TK Leen, editors , 1995, NIPS 1995.

[24]  Matthias Rauterberg,et al.  Left/Right Hand Segmentation in Egocentric Videos , 2016, 1607.06264.

[25]  Amitabha Mukerjee,et al.  Non-linear Dimensionality Reduction by Locally Linear Isomaps , 2004, ICONIP.

[26]  Amit K. Roy-Chowdhury,et al.  Context-Aware Activity Recognition and Anomaly Detection in Video , 2013, IEEE Journal of Selected Topics in Signal Processing.

[27]  Luca Benini,et al.  Gesture Recognition Using Wearable Vision Sensors to Enhance Visitors’ Museum Experiences , 2015, IEEE Sensors Journal.

[28]  Abel G. Oliva,et al.  Gist of a scene , 2005 .

[29]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[30]  Thomas Villmann,et al.  Topology preservation in self-organizing feature maps: exact definition and measurement , 1997, IEEE Trans. Neural Networks.

[31]  Kiyoharu Aizawa,et al.  Context-based video retrieval system for the life-log applications , 2003, MIR '03.

[32]  D. Weaver,et al.  Self-organizing maps and boundary effects: quantifying the benefits of torus wrapping for mapping SOM trajectories , 2011, Pattern Analysis and Applications.

[33]  Francisco Flórez,et al.  Representing 2D Objects. Comparison of Several Self-Organizing Networks , 2002 .

[34]  Carlo S. Regazzoni,et al.  Bio-inspired relevant interaction modelling in cognitive crowd management , 2015, J. Ambient Intell. Humaniz. Comput..

[35]  Weihua Sheng,et al.  Motion- and location-based online human daily activity recognition , 2011, Pervasive Mob. Comput..

[36]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[37]  Matthias Rauterberg,et al.  The Evolution of First Person Vision Methods: A Survey , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[39]  Matthias Rauterberg,et al.  GPU Accelerated Left/Right Hand-Segmentation in First Person Vision , 2016, ECCV Workshops.

[40]  Sazali Yaacob,et al.  Wearable Real-Time Stereo Vision for the Visually Impaired , 2007, Eng. Lett..

[41]  David J. Crandall,et al.  PlaceAvoider: Steering First-Person Cameras away from Sensitive Spaces , 2014, NDSS.

[42]  Alex Pentland,et al.  Visual contextual awareness in wearable computing , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[43]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Matthias Rauterberg,et al.  A Dynamic Approach and a New Dataset for Hand-detection in First Person Vision , 2015, CAIP.

[45]  Jonny Farringdon,et al.  Visual Augmented Memory (VAM) , 2000, Digest of Papers. Fourth International Symposium on Wearable Computers.

[46]  Weiqiang Dong On Bias , Variance , 0 / 1-Loss , and the Curse of Dimensionality RK April 13 , 2014 .

[47]  Fabio Tozeto Ramos,et al.  Multi-scale Conditional Random Fields for first-person activity recognition , 2014, 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[48]  Marc Langheinrich,et al.  Remembering through lifelogging: A survey of human memory augmentation , 2016, Pervasive Mob. Comput..

[49]  Jean-Christophe Nebel,et al.  Recognition of Activities of Daily Living with Egocentric Vision: A Review , 2016, Sensors.

[50]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[51]  James M. Rehg,et al.  Modeling Actions through State Changes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Carlo S. Regazzoni,et al.  Hand detection in First Person Vision , 2013, Proceedings of the 16th International Conference on Information Fusion.

[53]  Mark O. Afolabi,et al.  Predicting Stock Prices Using a Hybrid Kohonen Self Organizing Map (SOM) , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[54]  Chun Zhu,et al.  Motion and Location-Based Online Human Daily Activity Recognition , 2013, Human Behavior Recognition Technologies.

[55]  Nikolaos G. Bourbakis,et al.  A survey of skin-color modeling and detection methods , 2007, Pattern Recognit..

[56]  Matthias Rauterberg,et al.  Filtering SVM frame-by-frame binary classification in a detection framework , 2015, 2015 IEEE International Conference on Image Processing (ICIP).