Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks

Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user’s pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a combined entity as well as point out limitations of the system.

[1]  Paul Lukowicz,et al.  Activity Recognition in Opportunistic Sensor Environments , 2011, FET.

[2]  Michael Beetz,et al.  EYEWATCHME—3D Hand and object tracking for inside out activity analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[3]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[4]  Vincent Lepetit,et al.  Dominant orientation templates for real-time detection of texture-less objects , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Hideo Saito,et al.  Task support system by displaying instructional video onto AR workspace , 2010, 2010 IEEE International Symposium on Mixed and Augmented Reality.

[6]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[7]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Samir Benbelkacem,et al.  Augmented Reality Platform for Solar Systems Maintenance Assistance , 2010 .

[9]  Holger Horz,et al.  Multimedia: How to Combine Language and Visuals , 2008 .

[10]  Ross T. Smith,et al.  Handbook of Augmented Reality , 2011 .

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Barbara Tversky,et al.  How to put things together , 2012, Cognitive Processing.

[13]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  A. Jazwinski Stochastic Processes and Filtering Theory , 1970 .

[15]  Christian Knöpfle,et al.  Template based authoring for AR based service scenarios , 2005, IEEE Proceedings. VR 2005. Virtual Reality, 2005..

[16]  Benjamin Naumann,et al.  Mental Representations A Dual Coding Approach , 2016 .

[17]  Didier Stricker,et al.  Advanced tracking through efficient image processing and visual-inertial sensor fusion , 2008, 2008 IEEE Virtual Reality Conference.

[18]  James McNames,et al.  Shoulder and Elbow Joint Angle Tracking With Inertial Sensors , 2012, IEEE Transactions on Biomedical Engineering.

[19]  Tetsuo Tomiyama,et al.  Advanced Engineering Informatics , 2007, Adv. Eng. Informatics.

[20]  Anthony G. Cohn,et al.  COGNITO: Activity monitoring and recovery , 2015 .

[21]  Sonja Stork,et al.  Human cognition in manual assembly: Theories and applications , 2010, Adv. Eng. Informatics.

[22]  Katsushi Ikeuchi,et al.  Task analysis based on observing hands and objects by vision , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  T. P. Caudell,et al.  Augmented reality: an application of heads-up display technology to manual manufacturing processes , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[24]  Mohamed Tadjine,et al.  Augmented Reality Platform for Collaborative E-Maintenance Systems , 2011 .

[25]  Claudio S. Pinhanez,et al.  Human action detection using PNF propagation of temporal constraints , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[26]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[27]  Punit Shah Toward a Neurobiology of Unrealistic Optimism , 2012, Front. Psychology.

[28]  Danica Kragic,et al.  Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects , 2008, ECCV.

[29]  Paul U. Lee,et al.  Lines, Blobs, Crosses and Arrows: Diagrammatic Communication with Schematic Figures , 2000, Diagrams.

[30]  Norbert Schmitz,et al.  A Low-Cost and Light-Weight Motion Tracking Suit , 2013, 2013 IEEE 10th International Conference on Ubiquitous Intelligence and Computing and 2013 IEEE 10th International Conference on Autonomic and Trusted Computing.

[31]  Borko Furht,et al.  Handbook of Augmented Reality , 2011 .

[32]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[33]  Didier Stricker,et al.  Learning task structure from video examples for workflow tracking and authoring , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[34]  Katharina Scheiter,et al.  Examining learning from text and pictures for different task types: Does the multimedia effect differ for conceptual, causal, and procedural tasks? , 2012, Comput. Hum. Behav..

[35]  Frank Biocca,et al.  Attention funnel: omnidirectional 3D cursor for mobile augmented reality platforms , 2006, CHI.

[36]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[37]  Daniel Thalmann,et al.  Haptic feedback in mixed-reality environment , 2007, The Visual Computer.

[38]  Anthony G. Cohn,et al.  Unsupervised Learning of Event Classes from Video , 2010, AAAI.

[39]  Andrew Y. C. Nee,et al.  Augmented reality applications in manufacturing: a survey , 2008 .

[40]  Rama Chellappa,et al.  Fast directional chamfer matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Maneesh Agrawala,et al.  Cognitive Design Principles for Automated Generation of Visualizations , 2020, Applied Spatial Cognition: From Research to Cognitive Technology.

[42]  Stefan Carlsson,et al.  Novelty detection from an ego-centric perspective , 2011, CVPR 2011.

[43]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[44]  Laine Mears,et al.  Mobile devices within manufacturing environments: a BMW applicability study , 2012 .

[45]  Walterio W. Mayol-Cuevas,et al.  6D Relocalisation for RGBD Cameras Using Synthetic View Regression , 2012, BMVC.

[46]  David W. Murray,et al.  Wearable hand activity recognition for event summarization , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[47]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[48]  Didier Stricker,et al.  Aerobic activity monitoring: towards a long-term approach , 2014, Universal Access in the Information Society.

[49]  Gabriele Bleser,et al.  Innovative system for real-time ergonomic feedback in industrial manufacturing. , 2013, Applied ergonomics.

[50]  M. Scriven Types of Evaluation and Types of Evaluator , 1996 .

[51]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Markus Huff,et al.  The verbal facilitation effect in learning to tie nautical knots , 2012 .

[53]  Steven K. Feiner,et al.  Exploring the Benefits of Augmented Reality Documentation for Maintenance and Repair , 2011, IEEE Transactions on Visualization and Computer Graphics.

[54]  Dima Damen,et al.  Egocentric Real-time Workspace Monitoring using an RGB-D camera , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[55]  Gustaf Hendeby,et al.  Using optical flow for filling the gaps in visual-inertial tracking , 2010, 2010 18th European Signal Processing Conference.

[56]  W. Schreiber,et al.  Virtuelle Techniken im industriellen Umfeld: Das AVILUS-Projekt - Technologien und Anwendungen , 2011 .

[57]  Ezio Malis,et al.  Improving vision-based control using efficient second-order minimization techniques , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[58]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[59]  Markus Huff,et al.  IBES: a tool for creating instructions based on event segmentation , 2013, Front. Psychol..

[60]  Yuichi Ohta,et al.  Object Tracking and Object Change Detection in Desktop Manipulation for Video-Based Interactive Manuals , 2004, PCM.

[61]  Cordelia Schmid,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[62]  Pierre Sens,et al.  Stream Processing of Healthcare Sensor Data: Studying User Traces to Identify Challenges from a Big Data Perspective , 2015, ANT/SEIT.

[63]  Werner Schreiber,et al.  Virtuelle Techniken im industriellen Umfeld , 2011 .

[64]  Didier Stricker,et al.  Morphing billboards for accurate reproduction of shape and shading of articulated objects with an application to real-time hand tracking , 2012, CompIMAGE.

[65]  Dominic Gorecky,et al.  Involving users in the design of augmented reality-based assistance in industrial assembly tasks , 2012 .

[66]  Richard Mayer,et al.  Multimedia Learning , 2001, Visible Learning Guide to Student Achievement.

[67]  Björn J. E. Johansson,et al.  User experience and acceptance of a mixed reality system in a naturalistic setting - a case study , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[68]  Mustafa Suphi Erden,et al.  Identifying welding skills for training and assistance with robot , 2009 .

[69]  Gary L. Allen,et al.  Applied Spatial Cognition: From Research to Cognitive Technology , 2007 .

[70]  Paul Lukowicz,et al.  Activity Recognition of Assembly Tasks Using Body-Worn Microphones and Accelerometers , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Daniel Roetenberg,et al.  Inertial and magnetic sensing of human motion , 2006 .

[72]  Anthony G. Cohn,et al.  Egocentric Activity Monitoring and Recovery , 2012, ACCV.

[73]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[74]  Gustaf Hendeby,et al.  Using egocentric vision to achieve robust inertial body tracking under magnetic disturbances , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[75]  Barbara Tversky,et al.  Structuring information interfaces for procedural learning. , 2003, Journal of experimental psychology. Applied.

[76]  Barbara Tversky,et al.  Arrows in Comprehending and Producing Mechanical Diagrams , 2006, Cogn. Sci..

[77]  Vincent Dupourqué,et al.  A robot operating system , 1984, ICRA.

[78]  Alois Knoll,et al.  Human workflow analysis using 3D occupancy grid hand tracking in a human-robot collaboration scenario , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[79]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[80]  M. P. Cuéllar,et al.  Handling Real-World Context Awareness, Uncertainty and Vagueness in Real-Time Human Activity Tracking and Recognition with a Fuzzy Ontology-Based Hybrid Method , 2014, Sensors.

[81]  Jakob Nielsen,et al.  Finding usability problems through heuristic evaluation , 1992, CHI.

[82]  R. Mayer,et al.  Three Facets of Visual and Verbal Learners: Cognitive Ability, Cognitive Style, and Learning Preference. , 2003 .

[83]  W. Ames Mathematics in Science and Engineering , 1999 .

[84]  Anthony G. Cohn,et al.  Workflow Activity Monitoring Using Dynamics of Pair-Wise Qualitative Spatial Relations , 2012, MMM.

[85]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[86]  Dima Damen,et al.  Real-time Learning and Detection of 3D Texture-less Objects: A Scalable Approach , 2012, BMVC.

[87]  Ricardo Chavarriaga,et al.  The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition , 2013, Pattern Recognit. Lett..

[88]  Hema Swetha Koppula,et al.  Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation , 2013, ICML.

[89]  Didier Stricker,et al.  Fast Hand Detection Using Posture Invariant Constraints , 2009, KI.