Lifelogging with SAESNEG: a system for the automated extraction of social network event groups

This thesis presents SAESNEG, a System for the Automated Extraction of Social Network Event Groups ; a pipeline for the aggregation of the personal social media footprint, and its partitioning into events, the event clustering problem. SAESNEG facilitates a reminiscence-friendly user experience, where the user is able to navigate their social media footprint. A range of socio-technical issues are explored: the challenges to reminiscence, lifelogging, ownership, and digital death. Whilst previous systems have focused on the organisation of a single type of data, such as photos or Tweets respectively; SAESNEG handles a variety of types of social network documents found in a typical footprint (e.g. photos, Tweets, check-ins), with a variety of image, text and other metadata di erently heterogeneous data; adapted to sparse, private events typical of the personal social media footprint. Phase A extracts information, focusing on natural language processing; new techniques are developed; including a novel distributed approach to handling temporal expressions, and a parser for social events (such as birthdays). Information is also extracted from image and metadata, the resultant annotations feeding the subsequent event clustering. Phase B performs event clustering through the application of a number of pairwise similarity strategies a mixture of new and existing algorithms. Clustering itself is achieved by combining machine-learning with correlation clustering. The main contributions of this thesis are the identi cation of the technical research task (and the associated social need), the development of novel algorithms and approaches, and the integration of these with existing algorithms to form the pipeline. Results demonstrate SAESNEG 's capability to perform event clustering on a di erently heterogeneous dataset, enabling users to achieve lifelogging in the context of their existing social media networks.

[1]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[2]  Jean-François Blanchette Total Recall: How the E-memory Revolution Will Change Everything; DELETE: The Virtue of Forgetting in the Digital Age , 2010 .

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Jan Hajic,et al.  Semi-Supervised Training for the Averaged Perceptron POS Tagger , 2009, EACL.

[5]  Jugal K. Kalita,et al.  Summarizing Microblogs Automatically , 2010, NAACL.

[6]  Dale Schuurmans,et al.  Language and Task Independent Text Categorization with Simple Language Models , 2003, NAACL.

[7]  Susanne Boll,et al.  Analysing Facebook features to support event detection for photo-based Facebook applications , 2012, ICMR '12.

[8]  Yasuyuki Sumi,et al.  ComicDiary: Representing Individual Experiences in a Comics Style , 2002, UbiComp.

[9]  Simson L. Garfinkel,et al.  Finding and Archiving the Internet Footprint , 2009 .

[10]  Gregory D. Abowd,et al.  Getting into the Living Memory Box: Family archives & holistic design , 2003, Personal and Ubiquitous Computing.

[11]  Michael L. Creech,et al.  FotoFile: a consumer multimedia organization and retrieval system , 1999, CHI '99.

[12]  R. Belli,et al.  The structure of autobiographical memory and the event history calendar: potential improvements in the quality of retrospective reports in surveys. , 1998, Memory.

[13]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[14]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[15]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[16]  Josef Steinberger,et al.  Supervised sentiment analysis in Czech social media , 2014, Inf. Process. Manag..

[17]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[18]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[19]  Sue Yeon Syn,et al.  Personal documentation on a social network site: Facebook, a collection of moments from your life? , 2014 .

[20]  Tom Crick,et al.  'The First Day of Summer': Parsing Temporal Expressions with Distributed Semantics , 2013, SGAI Conf..

[21]  Yanfeng Sun,et al.  MiAlbum - a system for home photo managemet using the semi-automatic image annotation approach , 2000, MM 2000.

[22]  Susanne Boll,et al.  Automatic creation of photo books from stories in social media , 2011, TOMCCAP.

[23]  Gérard Chollet,et al.  Audio-Visual Speech Synchrony Measure for Talking-Face Identity Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24]  Benjamin B. Bederson,et al.  Semi-automatic photo annotation strategies using event based clustering and clothing based person recognition , 2007, Interact. Comput..

[25]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[26]  Diana Maynard,et al.  Entity Extraction and Consolidation for Social Web Content Preservation , 2012, SDA.

[27]  David R. Millen,et al.  Identity management: multiple presentations of self in facebook , 2007, GROUP.

[28]  Benjamin B. Bederson,et al.  PhotoMesa: a zoomable image browser using quantum treemaps and bubblemaps , 2001, UIST '01.

[29]  Kateryna Rybina Sentiment analysis of contexts around query terms in documents , 2012 .

[30]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[31]  Philipp Cimiano,et al.  Event-based classification of social media streams , 2012, ICMR.

[32]  Corina Sas,et al.  Design for forgetting: disposing of digital possessions after a breakup , 2013, CHI.

[33]  Susanne Boll,et al.  MetaXa - Context- and Content-Driven Metadata Enhancement for Personal Photo Books , 2007, MMM.

[34]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[35]  Peng Wang,et al.  Semantic interpretation of events in lifelogging , 2012 .

[36]  Chris Dyer,et al.  Part-of-Speech Tagging for Twitter : Word Clusters and Other Advances , 2012 .

[37]  Qian Huang,et al.  Quantitative methods of evaluating image segmentation , 1995, Proceedings., International Conference on Image Processing.

[38]  Dan Klein,et al.  Named Entity Recognition with Character-Level Models , 2003, CoNLL.

[39]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[40]  Kalina Bontcheva,et al.  TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text , 2013, RANLP.

[41]  Jeffrey Nichols,et al.  Summarizing sporting events using twitter , 2012, IUI '12.

[42]  Gerhard Weikum,et al.  Big Data Methods for Computational Linguistics , 2012, IEEE Data Eng. Bull..

[43]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[44]  Dale Schuurmans,et al.  Language independent authorship attribution using character level language models , 2003, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - EACL '03.

[45]  S. Albayrak,et al.  Language-Independent Twitter Sentiment Analysis , 2012 .

[46]  Abigail Sellen,et al.  Technology heirlooms?: considerations for passing down and inheriting digital materials , 2012, CHI.

[47]  Yorick Wilks,et al.  A Closer Look at Skip-gram Modelling , 2006, LREC.

[48]  Hila Becker,et al.  Identifying content for planned events across social media sites , 2012, WSDM '12.

[49]  Bernard Mérialdo,et al.  Saliency moments for image categorization , 2011, ICMR.

[51]  Alex S. Taylor,et al.  Photo displays in the home , 2008, DIS '08.

[52]  Gordon Bell,et al.  MyLifeBits: a personal database for everything , 2006, CACM.

[53]  Robert F. Simmons,et al.  A Computational Approach to Grammatical Coding of English Words , 1963, JACM.

[54]  Nicholas Diakopoulos,et al.  Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs , 2011, EMNLP.

[55]  Mathias Lux,et al.  Caliph & Emir: MPEG-7 photo annotation and retrieval , 2009, ACM Multimedia.

[56]  Susanne Boll,et al.  Detecting Multimedia Contents of Social Events in Social Networks , 2013, Social Media Retrieval.

[57]  Andrew Smith,et al.  Using Gazetteers in Discriminative Information Extraction , 2006, CoNLL.

[58]  Steffen Staab,et al.  Exploiting Flickr Tags and Groups for Finding Landmark Photos , 2009, ECIR.

[59]  Huan Liu,et al.  Enriching short text representation in microblog for clustering , 2012, Frontiers of Computer Science.

[60]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[61]  Xuan Zhao,et al.  The many faces of facebook: experiencing social media as performance, exhibition, and personal archive , 2013, CHI.

[62]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[63]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[64]  Ramesh Jain,et al.  Event Discovery in Multimedia Reconnaissance Data Using Spatio-Temporal Clustering , 2006 .

[65]  Michael L. Nelson,et al.  What happens when facebook is gone? , 2009, JCDL '09.

[66]  Masahide Nakamura,et al.  Exploiting No-SQL DB for Implementing Lifelog Mashup Platform , 2014 .

[67]  James Pustejovsky,et al.  Temporal Processing with the TARSQI Toolkit , 2008, COLING.

[68]  Reiner Fageth,et al.  Image selection: no longer a dilemma? , 2008, Electronic Imaging.

[69]  Hila Becker,et al.  Learning similarity metrics for event identification in social media , 2010, WSDM '10.

[70]  Ann Blandford,et al.  The roles of time, place, value and relationships in collocated photo sharing with camera phones , 2008, BCS HCI.

[71]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[72]  Susanne Boll,et al.  Semantic analysis and retrieval in personal and social photo collections , 2010, Multimedia Tools and Applications.

[73]  Yannis Manolopoulos,et al.  Trends in Blog Preservation , 2012, ICEIS.

[74]  Scott P. Robertson,et al.  The social life of social networks: Facebook linkage patterns in the 2008 U.S. presidential election , 2009, D.GO.

[75]  Charu C. Aggarwal,et al.  Social Network Data Analytics , 2011 .

[76]  Maurice Mulvenna,et al.  Proceedings of First International Workshop on Reminiscence Systems , 2009 .

[77]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[78]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[79]  Daniel Jurafsky,et al.  Parsing Time: Learning to Interpret Time Expressions , 2012, NAACL.

[80]  Eugene Charniak,et al.  Equations for Part-of-Speech Tagging , 1993, AAAI.

[81]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[82]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[83]  Alan F. Smeaton,et al.  Everyday concept detection in visual lifelogs: validation, relationships and trends , 2010, Multimedia Tools and Applications.

[84]  James F. Allen An Interval-Based Representation of Temporal Knowledge , 1981, IJCAI.

[85]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[86]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[87]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[88]  Susanne Boll,et al.  Large scale flexible event-based clustering from photos in social media , 2011, ICIMCS '11.

[89]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[90]  Kalina Bontcheva,et al.  Making sense of social media streams through semantics: A survey , 2014, Semantic Web.

[91]  Mark Smith,et al.  University of Durham: description of the LOLITA system as used in MUC-6 , 1995, MUC.

[92]  Amit P. Sheth,et al.  Spatio-Temporal-Thematic Analysis of Citizen Sensor Data: Challenges and Experiences , 2009, WISE.

[93]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[94]  Ramesh Jain,et al.  Toward a Common Event Model for Multimedia Applications , 2007, IEEE MultiMedia.

[95]  Kerry Rodden,et al.  How do people manage their digital photographs? , 2003, CHI '03.

[96]  Derek Greene,et al.  Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[97]  Michael Massimi,et al.  Dying, death, and mortality: towards thanatosensitivity in HCI , 2009, CHI Extended Abstracts.

[98]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[99]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[100]  Nancy A. Van House,et al.  Flickr and public image-sharing: distant closeness and photo exhibition , 2007, CHI Extended Abstracts.

[101]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[102]  Jun Hong,et al.  Sarcasm Detection on Czech and English Twitter , 2014, COLING.

[103]  Valentin I. Spitkovsky,et al.  A Cross-Lingual Dictionary for English Wikipedia Concepts , 2012, LREC.

[104]  Tao Mei,et al.  Probabilistic Multimodality Fusion for Event based Home Photo Clustering , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[105]  M. Elsner,et al.  Bounding and Comparing Methods for Correlation Clustering Beyond ILP , 2009, ILP 2009.

[106]  R. Fivush,et al.  Autobiographical knowledge and autobiographical memories , 1996 .

[107]  Shahram Izadi,et al.  SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[108]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[109]  Abigail Sellen,et al.  Opening up the family archive , 2010, CSCW '10.

[110]  Liliana Ardissono,et al.  From Service Clouds to User-Centric Personal Clouds , 2009, 2009 IEEE International Conference on Cloud Computing.

[111]  Franco Salvetti,et al.  Weblog Classification for Fast Splog Filtering: A URL Language Model Segmentation Approach , 2006, NAACL.

[112]  Ronald Baecker,et al.  A death in the family: opportunities for designing technologies for the bereaved , 2010, CHI.

[113]  Nianwen Xue,et al.  Chinese Word Segmentation as Character Tagging , 2003, ROCLING/IJCLCLP.

[114]  Kenji Araki,et al.  Research on Emoticons: Re view of the Field and Proposal of Research Framework , 2011 .

[115]  David A. Shamma,et al.  Tweet the debates: understanding community annotation of uncollected sources , 2009, WSM@MM.

[116]  Susanne Boll,et al.  Processes of photo book production , 2008, Multimedia Systems.

[117]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[118]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[119]  Cathal Gurrin,et al.  The smartphone as a platform for wearable cameras in health research. , 2013, American journal of preventive medicine.

[120]  Brian M. Landry Storytelling with digital photographs: supporting the practice, understanding the benefit , 2008, CHI Extended Abstracts.

[121]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[122]  Sung-Bae Cho,et al.  AniDiary: Daily Cartoon-Style Diary Exploits Bayesian Networks , 2007, IEEE Pervasive Computing.

[123]  H. Cunningham,et al.  Developing Language Processing Components with GATE , 2001 .

[124]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[125]  Youngho Rhee,et al.  Designing Mobile Social Networking Service Through UCD Process: LifeDiary , 2010, Int. J. Hum. Comput. Interact..

[126]  Trevor Cohn,et al.  Logarithmic Opinion Pools for Conditional Random Fields , 2005, ACL.

[127]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[128]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[129]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[130]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[131]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[132]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[133]  James Pustejovsky,et al.  TempEval-3: Evaluating Events, Time Expressions, and Temporal Relations , 2012, ArXiv.

[134]  Masahide Nakamura,et al.  On integrating heterogeneous lifelog services , 2010, iiWAS.

[135]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[136]  Stacey Pitsillides The future of looking back , 2012 .

[137]  Evangelos Karapanos,et al.  Footprint tracker: supporting diary studies with lifelogging , 2013, CHI.

[138]  Thomas Risse,et al.  Evolving Domains, Problems and Solutions for Long Term Digital Preservation , 2011, iPRES.

[139]  Tom Crick,et al.  Digital Footprints: Envisaging and Analysing Online Behaviour , 2015 .

[140]  Bakkama Srinath Reddy,et al.  Evidential Reasoning for Multimodal Fusion in Human Computer Interaction , 2007 .

[141]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[142]  Abigail Sellen,et al.  Beyond total capture , 2010, Commun. ACM.

[143]  Eui-Nam Huh,et al.  Towards the Development of Personal Cloud Computing for Mobile Thin-Clients , 2011, 2011 International Conference on Information Science and Applications.

[144]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[145]  Wei Wang,et al.  Composable IO: A Novel Resource Sharing Platform in Personal Clouds , 2009, CloudCom.

[146]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[147]  Trent Apted,et al.  Tabletop sharing of digital photographs for the elderly , 2006, CHI.

[148]  George R. Krupka,et al.  IsoQuest Inc.: Description of the NetOwl™ Extractor System as Used for MUC-7 , 1998, MUC.

[149]  Susanne Boll,et al.  Blog2Book: transforming blogs into photo books employing aesthetic principles , 2010, ACM Multimedia.

[150]  Kalina Bontcheva,et al.  Recognising and Interpreting Named Temporal Expressions , 2013, RANLP.