Structural model discovery in temporal event data streams

This dissertation presents a unique approach to human behavior analysis based on expert guidance and intervention through interactive construction and modification of behavior models. Our focus is to introduce the research area of behavior analysis, the challenges faced by this field, current approaches available, and present a new analysis approach: Interactive Relevance Search and Modeling (IRSM). More intelligent ways of conducting data analysis have been explored in recent years. Machine learning and data mining systems that utilize pattern classification and discovery in non-textual data promise to bring new generations of powerful “crawlers” for knowledge discovery, e.g., face detection and crowd surveillance. Many aspects of data can be captured by such systems, e.g., temporal information, extractable visual information—color, contrast, shape, etc. However, these captured aspects may not uncover all salient information in the data or provide adequate models/patterns of phenomena of interest. This is a challenging problem for social scientists who are trying to identify high-level, conceptual patterns of human behavior from observational data (e.g., media streams). The presented research addresses how social scientists may derive patterns of human behavior captured in media streams. Currently, media streams are being segmented into sequences of events describing the actions captured in the streams, such as the interactions among humans. This segmentation creates a challenging data space to search characterized by nonnumerical, temporal, descriptive data, e.g., Person A walks up to Person B at time T. This dissertation will present an approach that allows one to interactively search, identify, and discover temporal behavior patterns within such a data space. Therefore, this research addresses supporting exploration and discovery in behavior analysis through a formalized method of assisted exploration. The model evolution presented supports the refining of the observer’s behavior models into representations of their understanding. The benefit of the new approach is shown through experimentation on its identification accuracy and working with fellow researchers to verify the approach’s legitimacy in analysis of their data.

[1]  John F. Roddick,et al.  Mining Relationships Between Interacting Episodes , 2004, SDM.

[2]  David McNeill,et al.  Gesture, Gaze, and Ground , 2005, MLMI.

[3]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[4]  Hennie Brugman,et al.  Annotating Multi-media/Multi-modal Resources with ELAN , 2004, LREC.

[5]  Stephan M. Winkler,et al.  Genetic Algorithms and Genetic Programming - Modern Concepts and Practical Applications , 2009 .

[6]  Lotfi A. Zadeh,et al.  Soft computing and fuzzy logic , 1994, IEEE Software.

[7]  Alfred Ultsch,et al.  A Method for Temporal Knowledge Conversion , 1999, IDA.

[8]  Frank Höppner,et al.  Knowledge discovery from sequential data , 2003 .

[9]  Ada Wai-Chee Fu,et al.  Discovering Temporal Patterns for Interval-Based Events , 2000, DaWaK.

[10]  Peter Stanchev,et al.  Content-Based Image Retrieval Systems , 2001 .

[11]  Rainer Stiefelhagen,et al.  3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios , 2010, ICMI-MLMI '10.

[12]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[13]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[14]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[15]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[16]  M S Magnusson,et al.  Discovering hidden time patterns in behavior: T-patterns and their detection , 2000, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[17]  Andreas Bastian Identifying fuzzy models utilizing genetic programming , 2000, Fuzzy Sets Syst..

[18]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[19]  Thomas C. Schmidt The transcription system EXMARaLDA: An application of the annotation graph formalism as the basis of a database of multilingual spoken discourse , 2001 .

[20]  Kumpati S. Narendra,et al.  Adaptation and learning using multiple models, switching, and tuning , 1995 .

[21]  Chih-Ping Chou,et al.  Model Modification in Structural Equation Modeling by Imposing Constraints , 2002, Comput. Stat. Data Anal..

[22]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[23]  Lotfi A. Zadeh,et al.  Fuzzy logic = computing with words , 1996, IEEE Trans. Fuzzy Syst..

[24]  Eddie Schwalb,et al.  Temporal Constraints: A Survey , 1998, Constraints.

[25]  Alex Groce,et al.  Adaptive Model Checking , 2006, Log. J. IGPL.

[26]  Martha Larson,et al.  ACM multimedia 2012 workshop on crowdsourcing for multimedia , 2012, ACM Multimedia.

[27]  Ruiduo Yang,et al.  Efficient Generation of Large Amounts of Training Data for Sign Language Recognition: A Semi-automatic Tool , 2006, ICCHP.

[28]  Thomas C. Schmidt,et al.  EXMARaLDA – creating, analysing and sharing spoken language corpora for pragmatic research , 2009 .

[29]  P. S. Sastry,et al.  Discovering frequent episodes and learning hidden Markov models: a formal connection , 2005, IEEE Transactions on Knowledge and Data Engineering.

[30]  Debprakash Patnaik,et al.  Inferring neuronal network connectivity from spike data: A temporal data mining approach , 2008, Sci. Program..

[31]  Polle Zellweger,et al.  Scheduling Multimedia Documents Using Temporal Constraints , 1992, NOSSDAV.

[32]  Christian R. Huyck,et al.  Automated discourse segmentation by syntactic information and cue phrases. , 2004 .

[33]  Francis K. H. Quek,et al.  Gesture, speech, and gaze cues for discourse segmentation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[34]  Dmitriy Fradkin,et al.  Robust Mining of Time Intervals with Semi-interval Partial Order Patterns , 2010, SDM.

[35]  Naren Ramakrishnan,et al.  Experiences with mining temporal event sequences from electronic medical records: initial successes and some challenges , 2011, KDD.

[36]  C. Creider Hand and Mind: What Gestures Reveal about Thought , 1994 .

[37]  Akira Utsumi,et al.  Multiple-hand-gesture tracking using multiple cameras , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[38]  Christoph Bregler,et al.  Hands by hand: Crowd-sourced motion tracking for gesture annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[39]  Naren Ramakrishnan,et al.  Structuring ordered nominal data for event sequence discovery , 2010, ACM Multimedia.

[40]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[41]  Francis K. H. Quek,et al.  Structural and temporal inference search (STIS): pattern identification in multimodal data , 2012, ICMI '12.

[42]  Victor Cheng,et al.  Dissimilarity learning for nominal data , 2004, Pattern Recognit..

[43]  Hans-Georg Beyer,et al.  The Theory of Evolution Strategies , 2001, Natural Computing Series.

[44]  Ying Yin,et al.  A hierarchical approach to continuous gesture analysis for natural multi-modal interaction , 2012, ICMI '12.

[45]  Fabian Mörchen,et al.  Unsupervised pattern mining from symbolic temporal data , 2007, SKDD.

[46]  David J. Murray-Smith,et al.  Nonlinear model structure identification using genetic programming , 1998 .

[47]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[48]  Thad Starner,et al.  American sign language recognition with the kinect , 2011, ICMI '11.

[49]  Gerald M. Knapp,et al.  Affect corpus 2.0: an extension of a corpus for actor level emotion magnitude detection , 2011, MMSys.

[50]  Ramesh Jain,et al.  Toward a Common Event Model for Multimedia Applications , 2007, IEEE MultiMedia.

[51]  Harvey Sacks,et al.  Lectures on Conversation , 1995 .

[52]  Kazuhiro Otsuka,et al.  Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection , 2010, ICMI-MLMI '10.

[53]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[54]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[55]  Rich Caruana,et al.  Learning speaker, addressee and overlap detection models from multimodal streams , 2012, ICMI '12.

[56]  Maria Pateraki,et al.  Two people walk into a bar: dynamic multi-party social interaction with a robot agent , 2012, ICMI '12.

[57]  Rebecca J. Passonneau,et al.  Empirical Analysis of Three Dimensions of Spoken Discourse: Segmentation, Coherence, and Linguistic Devices , 1996 .

[58]  Shahrel Azmin Suandi,et al.  Hand gesture tracking system using Adaptive Kalman Filter , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[59]  Vuokko Lantz,et al.  Fishing or a Z?: investigating the effects of error on mimetic and alphabet device-based gesture interaction , 2012, ICMI '12.

[60]  Francis K. H. Quek,et al.  Fun to develop embodied skill: how games help the blind to understand pointing , 2010, PETRA '10.

[61]  Polle T. Zellweger,et al.  Automatic temporal layout mechanisms , 2001 .

[62]  P. S. Sastry,et al.  A survey of temporal data mining , 2006 .

[63]  Francis K. H. Quek,et al.  As go the feet...: on the estimation of attentional focus from stance , 2008, ICMI '08.

[64]  Michael Kipp Spatiotemporal Coding in ANVIL , 2008, LREC.

[65]  Michael Kipp,et al.  An Exchange Format for Multimodal Annotations , 2008, LREC.

[66]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[67]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[68]  P. S. Sastry,et al.  Conditional Probability-Based Significance Tests for Sequential Patterns in Multineuronal Spike Trains , 2008, Neural Computation.

[69]  Roman Bednarik,et al.  Gaze and conversational engagement in multiparty video conversation: an annotation scheme and classification of high and low levels of engagement , 2012, Gaze-In '12.

[70]  M. Orne On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. , 1962 .

[71]  Gerhard Rigoll,et al.  Multimodal meeting analysis by segmentation and classification of meeting events based on a higher level semantic approach , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[72]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[73]  Mari Ostendorf,et al.  Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[74]  D. Kaplan Evaluating and Modifying Covariance Structure Models: A Review and Recommendation. , 1990, Multivariate behavioral research.

[75]  Christian Freksa,et al.  Temporal Reasoning Based on Semi-Intervals , 1992, Artif. Intell..

[76]  Fabian Mörchen,et al.  Time Series Knowledge Mining , 2006 .

[77]  Maja Pantic,et al.  The SEMAINE corpus of emotionally coloured character interactions , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[78]  Ron Artstein,et al.  Crowdsourcing micro-level multimedia annotations: the challenges of evaluation and interface , 2012, CrowdMM '12.

[79]  Naoki Tanaka,et al.  User-calibration-free gaze tracking with estimation of the horizontal angles between the visual and the optical axes of both eyes , 2010, ETRA.

[80]  Changsheng Xu,et al.  A generic framework for event detection in various video domains , 2010, ACM Multimedia.

[81]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[82]  Fabio Valente,et al.  Predicting the conflict level in television political debates: an approach based on crowdsourcing, nonverbal communication and gaussian processes , 2012, ACM Multimedia.

[83]  Oliver Brdiczka,et al.  Temporal task footprinting: identifying routine tasks by their temporal patterns , 2010, IUI '10.

[84]  Mary P. Harper,et al.  Gestural spatialization in natural discourse segmentation , 2002, INTERSPEECH.

[85]  Alexander G. Hauptmann,et al.  Leveraging high-level and low-level features for multimedia event detection , 2012, ACM Multimedia.

[86]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[87]  John R. Koza,et al.  Genetic Programming III - Darwinian Invention and Problem Solving , 1999, Evolutionary Computation.

[88]  Chong-Wah Ngo,et al.  Semantic context transfer across heterogeneous sources for domain adaptive video search , 2009, ACM Multimedia.

[89]  Min-Chun Hu,et al.  Human action recognition and retrieval using sole depth information , 2012, ACM Multimedia.

[90]  Mary P. Harper,et al.  Using maximum entropy (ME) model to incorporate gesture cues for SU detection , 2006, ICMI '06.

[91]  Peter Pirolli,et al.  Information Foraging , 2009, Encyclopedia of Database Systems.

[92]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[93]  Francis K. H. Quek,et al.  Gestural Origo and Loci-Transitions in Natural Discourse Segmentation , 2001 .

[94]  Giovanni Flammia,et al.  Discourse segmentation of spoken dialogue: an empirical approach , 1998 .

[95]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[96]  Kristy Elizabeth Boyer,et al.  Multimodal analysis of the implicit affective channel in computer-mediated textual communication , 2012, ICMI '12.

[97]  Paul R. Cohen,et al.  Fluent Learning: Elucidating the Structure of Episodes , 2001, IDA.

[98]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[99]  Fabian Mörchen,et al.  Algorithms for time series knowledge mining , 2006, KDD '06.

[100]  Nicu Sebe,et al.  Knowledge adaptation for ad hoc multimedia event detection with few exemplars , 2012, ACM Multimedia.

[101]  Gina-Anne Levow,et al.  Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue , 2004, SIGDIAL Workshop.

[102]  Frank E. Ritter,et al.  Supporting activity modelling from activity traces , 2012, Expert Syst. J. Knowl. Eng..

[103]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[104]  Francis K. H. Quek,et al.  MacVisSTA: a system for multimodal analysis , 2004, ICMI '04.

[105]  David B. Fogel,et al.  Evolutionary Computation: Towards a New Philosophy of Machine Intelligence , 1995 .

[106]  Alexandre Urzhumtsev,et al.  Improvement of protein phases by coarse model modification , 1984 .

[107]  Rebecca J. Passonneau,et al.  Combining Multiple Knowledge Sources for Discourse Segmentation , 1995, ACL.

[109]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[110]  Klaus Schöffmann,et al.  The video explorer: a tool for navigation and searching within a single video based on fast content analysis , 2010, MMSys '10.

[111]  Xuejing Sun,et al.  Intonational phrase break prediction using decision tree and n-gram model , 2001, INTERSPEECH.

[112]  Karrie Karahalios,et al.  VCode and VData: illustrating a new framework for supporting the video annotation workflow , 2008, AVI '08.

[113]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[114]  Louis-Philippe Morency,et al.  I already know your answer: using nonverbal behaviors to predict immediate outcomes in a dyadic negotiation , 2012, ICMI '12.

[115]  Egidio P. Giachin,et al.  Phrase bigrams for continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[116]  Ning Wang,et al.  Creating Rapport with Virtual Agents , 2007, IVA.

[117]  Francis K. H. Quek,et al.  Interactive data-driven discovery of temporal behavior models from events in media streams , 2012, ACM Multimedia.

[118]  Xiaotong Shen,et al.  Adaptive Model Selection , 2002 .

[119]  Yi-Ping Hung,et al.  Action recognition for human-marionette interaction , 2012, ACM Multimedia.

[120]  Marti A. Hearst TextTiling: A Quantitative Approach to Discourse , 1993 .

[121]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[122]  Matthew E Hurles,et al.  The population genetics of structural variation , 2007, Nature Genetics.

[123]  Rada Mihalcea,et al.  Towards multimodal deception detection -- step 1: building a collection of deceptive videos , 2012, ICMI '12.

[124]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[125]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[126]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[127]  Hermann Ney,et al.  Language-model look-ahead for large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[128]  Trevor Darrell,et al.  Recognizing gaze aversion gestures in embodied conversational discourse , 2006, ICMI '06.

[129]  Hans-Paul Schwefel,et al.  Evolution strategies – A comprehensive introduction , 2002, Natural Computing.

[130]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[131]  Mary P. Harper,et al.  Multimodal floor control shift detection , 2009, ICMI-MLMI '09.

[132]  Kumpati S. Narendra,et al.  Adaptive control using multiple models , 1997, IEEE Trans. Autom. Control..

[133]  Mary P. Harper,et al.  A Multimodal Analysis of Floor Control in Meetings , 2006, MLMI.

[134]  Francis K. H. Quek The Catchment Feature Model: A Device for Multimodal Fusion and a Bridge between Signal and Sense , 2004, EURASIP J. Adv. Signal Process..

[135]  Michael Kipp,et al.  Gesture generation by imitation: from human behavior to computer character animation , 2005 .

[136]  Pulkit Budhiraja,et al.  The blue one to the left: enabling expressive user interaction in a multimodal interface for object selection in virtual 3d environments , 2012, ICMI '12.

[137]  Richard Travis Rose MacVisSTA: A System for Multimodal Analysis of Human Communication and Interaction , 2007 .

[138]  Mary P. Harper,et al.  Improving A Simple Bigram HMM Part-of-Speech Tagger by Latent Annotation and Self-Training , 2009, NAACL.

[139]  Prasenjit Dey,et al.  Designing multiuser multimodal gestural interactions for the living room , 2012, ICMI '12.

[140]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[141]  Yale Song,et al.  Multimodal human behavior analysis: learning correlation and interaction across modalities , 2012, ICMI '12.

[142]  Roger Zimmermann,et al.  Automatic tag generation and ranking for sensor-rich outdoor videos , 2011, MM '11.

[143]  Francis K. H. Quek,et al.  Toward multimodal situated analysis , 2011, ICMI '11.

[144]  Rongrong Wang,et al.  Interaction techniques for the analysis of complex data on high-resolution displays , 2008, ICMI '08.

[145]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[146]  A. Kendon Conducting Interaction: Patterns of Behavior in Focused Encounters , 1990 .

[147]  Mary P. Harper,et al.  VACE Multimodal Meeting Corpus , 2005, MLMI.

[148]  Dmitry Vetrov,et al.  The Algorithm for Detection of Fuzzy Behavioral Patterns , 2010 .