Understanding real-world phenomena from human-generated sensor data

Nowadays, there is an increasing data availability. Smartphones, wearable devices, social media, web browsing information and sales recordings are only few of the newly available information sources. Analysing this kind of information is an important step towards understanding human behaviour. In this dissertation, I propose novel techniques for uncovering the complex dependencies between factors extracted from raw sensor data and real-world phenomena and I demonstrate the potential of utilising the vast amount of human digital traces in order to better understand human behaviour and factors influenced by it. In particular, two main problems are considered: 1) whether there is a dependency between social media data and traded assets prices and 2) how smartphone sensor data can be used to understand factors that influence our stress level. In this thesis, I focus on uncovering the structural dependencies among factors of interest rather than on the detection of mere correlation. Special attention is given on enhancing the reliability of the findings by developing techniques that can better handle the specific characteristics of the examined datasets. Although the developed approaches are motivated by specific problems related to human-generated sensor data, they are general and can be applied in any dataset with similar characteristics.

[1]  Alexandre d'Aspremont,et al.  Predicting abnormal returns from news using text classification , 2008, 0809.2792.

[2]  Schreiber,et al.  Measuring information transfer , 2000, Physical review letters.

[3]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[4]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[5]  P. Gloor,et al.  Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear” , 2011 .

[6]  Eni Mustafaraj,et al.  Can Collective Sentiment Expressed on Twitter Predict Political Elections? , 2011, AAAI.

[7]  D. Rubin,et al.  Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome , 1983 .

[8]  Tim Loughran,et al.  Using 10-K Text to Gauge Financial Constraints , 2015 .

[9]  Fanglin Chen,et al.  StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones , 2014, UbiComp.

[10]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[11]  B. Pompe,et al.  Momentary information transfer as a coupling measure of time series. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  E. Ben-Jacob,et al.  Dominating Clasp of the Financial Sector Revealed by Partial Correlation Analysis of the Stock Market , 2010, PloS one.

[13]  Mitchell J. Mergenthaler Nonparametrics: Statistical Methods Based on Ranks , 1979 .

[14]  Mirco Musolesi,et al.  Spatio-temporal techniques for user identification by means of GPS mobility data , 2015, EPJ Data Science.

[15]  D. Sornette,et al.  Endogenous versus Exogenous Origins of Crises , 2004, physics/0412026.

[16]  Daniel Gatica-Perez,et al.  Discovering places of interest in everyday life from smartphone data , 2011, Multimedia Tools and Applications.

[17]  Alan R. Dennis,et al.  Trading on Twitter: The Financial Information Content of Emotion in Social Media , 2014, 2014 47th Hawaii International Conference on System Sciences.

[18]  P. Bentler MULTIVARIATE ANALYSIS WITH LATENT VARIABLES: CAUSAL MODELING , 1980 .

[19]  L. Pezard,et al.  Delay independence of mutual-information rate of two symbolic sequences. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Didier Sornette,et al.  Endogenous Versus Exogenous Crashes in Financial Markets , 2002 .

[21]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[22]  J. Sekhon,et al.  Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies , 2006, Review of Economics and Statistics.

[23]  J. Concato,et al.  Randomized, controlled trials, observational studies, and the hierarchy of research designs. , 2000, The New England journal of medicine.

[24]  Kazutoshi Sumiya,et al.  Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection , 2010, LBSN '10.

[25]  N. Christakis,et al.  Social network determinants of depression , 2011, Molecular Psychiatry.

[26]  Hojung Cha,et al.  Automatically characterizing places with opportunistic crowdsensing using smartphones , 2012, UbiComp.

[27]  Kristopher J Preacher,et al.  Quantifying Parsimony in Structural Equation Modeling , 2006, Multivariate behavioral research.

[28]  N. Bolger,et al.  Personality and the problems of everyday life: the role of neuroticism in exposure and reactivity to daily stressors. , 1991, Journal of personality.

[29]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[30]  P. Austin Goodness‐of‐fit diagnostics for the propensity score model when estimating treatment effects using covariate adjustment with the propensity score , 2008, Pharmacoepidemiology and drug safety.

[31]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[32]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[33]  Nathan Kallus,et al.  A Framework for Optimal Matching for Causal Inference , 2016, AISTATS.

[34]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[35]  H Eugene Stanley,et al.  Complex dynamics of our economic life on different scales: insights from search engine query data , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[36]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[37]  Davide Anguita,et al.  Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine , 2012, IWAAL.

[38]  Tomaso Aste,et al.  A nonlinear impact: evidences of causal effects of social media on market prices , 2016, ArXiv.

[39]  Jürgen Kurths,et al.  Distinguishing direct from indirect interactions in oscillatory networks with multiple time scales. , 2010, Physical review letters.

[40]  Nigel Collier,et al.  An Experiment in Integrating Sentiment Features for Tech Stock Prediction in Twitter , 2012 .

[41]  Fabio Pianesi,et al.  Happiness Recognition from Mobile Phone Data , 2013, 2013 International Conference on Social Computing.

[42]  Donald Rubin,et al.  Estimating Causal Effects from Large Data Sets Using Propensity Scores , 1997, Annals of Internal Medicine.

[43]  Hua Shen,et al.  Detecting Spammers on Twitter Based on Content and Social Interaction , 2015, 2015 International Conference on Network and Information Systems for Computers.

[44]  Xia Zhou,et al.  SmartGPA: how smartphones can assess and predict academic performance of college students , 2015, GETMBL.

[45]  P. Gloor,et al.  Predicting Asset Value through Twitter Buzz , 2012 .

[46]  J. Lunceford,et al.  Strati cation and weighting via the propensity score in estimation of causal treatment e ects : a comparative study , 2004 .

[47]  P. Trivellas,et al.  The Effect of Job Related Stress on Employees’ Satisfaction: A Survey in Health Care , 2013 .

[48]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[49]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[50]  Elizabeth A Stuart,et al.  Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. , 2010, Psychological methods.

[51]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[52]  Sheung Yin Kevin Mo,et al.  Twitter financial community sentiment and its predictive relationship to stock market movement , 2015 .

[53]  Mohamed E. Ibrahim,et al.  Effects of Organizational Justice, Employee Satisfaction, and Gender on Employees' Commitment: Evidence from the UAE , 2014 .

[54]  A. Seth,et al.  Granger causality and transfer entropy are equivalent for Gaussian variables. , 2009, Physical review letters.

[55]  Gianluca Stringhini,et al.  The web centipede: understanding how web communities influence each other through the lens of mainstream and alternative news sources , 2017, Internet Measurement Conference.

[56]  H. Stanley,et al.  Quantifying Trading Behavior in Financial Markets Using Google Trends , 2013, Scientific Reports.

[57]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[58]  Mirco Musolesi,et al.  Sensing meets mobile social networks: the design, implementation and evaluation of the CenceMe application , 2008, SenSys '08.

[59]  J. Stock,et al.  Introduction to Econometrics (3 Rd Updated Edition) , 2014 .

[60]  Robert Li Kam Wa MoodScope: Building a Mood Sensor from Smartphone Usage Patterns , 2012 .

[61]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[62]  Jasjeet S. Sekhon,et al.  Opiates for the Matches: Matching Methods for Causal Inference , 2009 .

[63]  Maria Wolters,et al.  Activity monitoring in patients with depression: a systematic review. , 2013, Journal of affective disorders.

[64]  Ying Wah Teh,et al.  Mining Personal Data Using Smartphones and Wearable Devices: A Survey , 2015, Sensors.

[65]  Keying Ye,et al.  Applied Bayesian Modeling and Causal Inference From Incomplete-Data Perspectives , 2005, Technometrics.

[66]  Dieter Fox,et al.  Location-Based Activity Recognition , 2005, KI.

[67]  Mirco Musolesi,et al.  Investigating causality in human behavior from smartphone sensor data: a quasi-experimental approach , 2015, EPJ Data Science.

[68]  Andrew Raij,et al.  A Survey of Incentive Techniques for Mobile Crowd Sensing , 2015, IEEE Internet of Things Journal.

[69]  Peter C. Austin,et al.  The Relative Ability of Different Propensity Score Methods to Balance Measured Covariates Between Treated and Untreated Subjects in Observational Studies , 2009, Medical decision making : an international journal of the Society for Medical Decision Making.

[70]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[71]  Jorge Gonçalves,et al.  A data hiding approach for sensitive smartphone data , 2016, UbiComp.

[72]  Sung-Bae Cho,et al.  Activity Recognition Using Hierarchical Hidden Markov Models on a Smartphone with 3D Accelerometer , 2011, HAIS.

[73]  M. Seligman,et al.  Positive psychology progress: empirical validation of interventions. , 2005, The American psychologist.

[74]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.

[75]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[76]  Cecilia Mascolo,et al.  Happier People Live More Active Lives: Using Smartphones to Link Happiness and Physical Activity , 2017, PloS one.

[77]  Dan Cosley,et al.  Mobile manifestations of alertness: connecting biological rhythms with patterns of smartphone app use , 2016, MobileHCI.

[78]  Sian Lun Lau,et al.  Supporting patient monitoring using activity recognition with a smartphone , 2010, 2010 7th International Symposium on Wireless Communication Systems.

[79]  Daniel Gatica-Perez,et al.  The Places of Our Lives: Visiting Patterns and Automatic Labeling from Longitudinal Smartphone Data , 2014, IEEE Transactions on Mobile Computing.

[80]  Sergio L. Schmukler,et al.  What Triggers Market Jitters? A Chronicle of the Asian Crisis , 1999 .

[81]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[82]  Bob E. Hayes,et al.  Measuring customer satisfaction : survey design, use, and statistical analysis methods , 1997 .

[83]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[84]  Tasha Glenn,et al.  New Measures of Mental State and Behavior Based on Data Collected From Sensors, Smartphones, and the Internet , 2014, Current Psychiatry Reports.

[85]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[86]  Paul Lukowicz,et al.  Can smartphones detect stress-related changes in the behaviour of individuals? , 2012, 2012 IEEE International Conference on Pervasive Computing and Communications Workshops.

[87]  N. Christakis,et al.  Alone in the Crowd: The Structure and Spread of Loneliness in a Large Social Network , 2009 .

[88]  Xiaotie Deng,et al.  Exploiting Topic based Twitter Sentiment for Stock Prediction , 2013, ACL.

[89]  Alexander Porshnev,et al.  Machine Learning in Prediction of Stock Market Indicators Based on Historical Data and Data from Twitter Sentiment Analysis , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[90]  Guido Caldarelli,et al.  S 1 Appendix , 2016 .

[91]  Chun Liu,et al.  Social Influence Bias : A Randomized Experiment , 2014 .

[92]  Thomas M. Cover,et al.  The entropy of Markov trajectories , 1993, IEEE Trans. Inf. Theory.

[93]  Peter Tiño,et al.  Probabilistic matching: Causal inference under measurement errors , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[94]  Oliver Hinz,et al.  Using Twitter to Predict the Stock Market , 2015, Business & Information Systems Engineering.

[95]  Oscar Mayora-Ibarra,et al.  Smartphone-Based Recognition of States and State Changes in Bipolar Disorder Patients , 2015, IEEE Journal of Biomedical and Health Informatics.

[96]  Gary King,et al.  Misunderstandings between experimentalists and observationalists about causal inference , 2008 .

[97]  P. Tiňo,et al.  Non-parametric causality detection: An application to social media and financial data , 2016, Physica A: Statistical Mechanics and its Applications.

[98]  Christine L. Lisetti,et al.  Using Noninvasive Wearable Computers to Recognize Human Emotions from Physiological Signals , 2004, EURASIP J. Adv. Signal Process..

[99]  Tomaso Aste,et al.  When Can Social Media Lead Financial Markets? , 2014, Scientific Reports.

[100]  N. Christakis,et al.  Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the Framingham Heart Study , 2008, BMJ : British Medical Journal.

[101]  J. Rangel,et al.  Macroeconomic News, Announcements, and Stock Market Jump Intensity Dynamics , 2010 .

[102]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[103]  Cosma Rohilla Shalizi,et al.  Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[104]  Sami Kärnä,et al.  Analysing customer satisfaction and quality in construction – the case of public and private customers , 2004 .

[105]  P. Hoyer,et al.  On Causal Discovery from Time Series Data using FCI , 2010 .

[106]  Albert Y. Zomaya,et al.  Local information transfer as a spatiotemporal filter for complex systems. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[107]  Shashi Shekhar,et al.  Discovering personally meaningful places: An interactive clustering approach , 2007, TOIS.

[108]  Cecilia Mascolo,et al.  Mobile Sensing at the Service of Mental Well-being: a Large-scale Longitudinal Study , 2017, WWW.

[109]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[110]  Javier Hernandez,et al.  SenseGlass: using google glass to sense daily emotions , 2014, UIST.

[111]  M. López-Cabarcos,et al.  Examining the influence of stock market variables on microblogging sentiment , 2016 .

[112]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[113]  Lisa A. Marsch,et al.  Leveraging Technology to Enhance Addiction Treatment and Recovery , 2012, Journal of addictive diseases.

[114]  Bernhard Schölkopf,et al.  Causal Inference on Time Series using Restricted Structural Equation Models , 2013, NIPS.

[115]  Brendan T. O'Connor,et al.  Censorship and deletion practices in Chinese social media , 2012, First Monday.

[116]  Francis C. M. Lau,et al.  A network perspective of the stock market , 2010 .

[117]  Nicholas A. Christakis,et al.  Social contagion theory: examining dynamic social networks and human behavior , 2011, Statistics in medicine.

[118]  Moshe Ben-Akiva,et al.  Smartphone-Based Survey for Real-Time and Retrospective Happiness Related to Travel and Activities , 2016 .

[119]  Michael L. Littman,et al.  Activity Recognition from Accelerometer Data , 2005, AAAI.

[120]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[121]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[122]  Paul R. Rosenbaum,et al.  Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms , 1993 .

[123]  Lan Li,et al.  Emotion Recognition Using Physiological Signals from Multiple Subjects , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.

[124]  P. De,et al.  Wisdom of Crowds: The Value of Stock Opinions Transmitted Through Social Media , 2013 .

[125]  Le Yu,et al.  POSTER: LocMask: A Location Privacy Protection Framework in Android System , 2014, CCS.

[126]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[127]  Aristides Gionis,et al.  Correlating financial time series with micro-blogging activity , 2012, WSDM '12.

[128]  Yong Liu Word-of-Mouth for Movies: Its Dynamics and Impact on Box Office Revenue , 2006 .

[129]  M. Thelwall Heart and Soul : Sentiment Strength Detection in the Social Web with , 2013 .

[130]  Andrew T. Campbell,et al.  BeWell: Sensing Sleep, Physical Activities and Social Interactions to Promote Wellbeing , 2014, Mobile Networks and Applications.

[131]  J. Murabito,et al.  The Spread of Alcohol Consumption Behavior in a Large Social Network , 2010, Annals of Internal Medicine.

[132]  Guodong Sun,et al.  Daily Mood Assessment Based on Mobile Phone Sensing , 2012, 2012 Ninth International Conference on Wearable and Implantable Body Sensor Networks.

[133]  Mirco Musolesi,et al.  Anticipatory mobile computing for behaviour change interventions , 2014, UbiComp Adjunct.

[134]  Shangkun Deng,et al.  Combining Technical Analysis with Sentiment Analysis for Stock Price Prediction , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[135]  Mirco Musolesi,et al.  Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis , 2015, UbiComp.

[136]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[137]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[138]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[139]  Yunxin Liu,et al.  Can Your Smartphone Infer Your Mood ? , 2011 .

[140]  Alessandro Vespignani,et al.  The role of the airline transportation network in the prediction and predictability of global epidemics , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[141]  Alex Pentland,et al.  Social sensing: obesity, unhealthy eating and exercise in face-to-face networks , 2010, Wireless Health.

[142]  Peter Hills,et al.  Positive moods derived from leisure and their relationship to happiness and personality , 1998 .

[143]  Dolores Añón Higón,et al.  The hasty wisdom of the mob: How market sentiment predicts stock market behavior , 2017, Expert Syst. Appl..

[144]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[145]  A. Seth,et al.  Multivariate Granger causality and generalized variance. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[146]  Steven E. Pav,et al.  Shy of the Character Limit: "Twitter Mood Predicts the Stock Market" Revisited , 2017 .

[147]  Alex Pentland,et al.  Daily Stress Recognition from Mobile Phone Data, Weather Conditions and Individual Traits , 2014, ACM Multimedia.

[148]  David F. Larcker,et al.  Structural Equation Models with Unobservable Variables and Measurement Error: Algebra and Statistics: , 1981 .

[149]  Nadine Strauß,et al.  Intraday News Trading: The Reciprocal Relationships Between the Stock Market and Economic News , 2017, Commun. Res..

[150]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[151]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[152]  Henry A. Kautz,et al.  Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields , 2007, Int. J. Robotics Res..

[153]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[154]  Elaine L. Zanutto,et al.  Matching With Doses in an Observational Study of a Media Campaign Against Drug Abuse , 2001, Journal of the American Statistical Association.

[155]  Daniel Gatica-Perez,et al.  StressSense: detecting stress in unconstrained acoustic environments using smartphones , 2012, UbiComp.

[156]  Jie Jennifer Zhang,et al.  Social Media and Firm Equity Value , 2013, Inf. Syst. Res..

[157]  Mung Chiang,et al.  Why watching movie tweets won't tell the whole story? , 2012, WOSN '12.

[158]  Alex Pentland,et al.  Using Social Sensing to Understand the Links between Sleep, Mood, and Sociability , 2011, 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing.

[159]  Cameron Marlow,et al.  A 61-million-person experiment in social influence and political mobilization , 2012, Nature.

[160]  Isabel Anger,et al.  Measuring influence on Twitter , 2011, i-KNOW '11.

[161]  Hongchul Lee,et al.  Sentiment analysis of twitter audiences: Measuring the positive or negative influence of popular twitterers , 2012, J. Assoc. Inf. Sci. Technol..

[162]  Oscar Mayora-Ibarra,et al.  Tell me your apps and I will tell you your mood: correlation of apps usage with bipolar disorder state , 2014, PETRA '14.

[163]  Julien Velcin,et al.  Sentiment analysis on social media for stock movement prediction , 2015, Expert Syst. Appl..

[164]  C. Glymour,et al.  STATISTICS AND CAUSAL INFERENCE , 1985 .

[165]  Jürgen Kurths,et al.  Escaping the curse of dimensionality in estimating multivariate transfer entropy. , 2012, Physical review letters.

[166]  Hanghang Tong,et al.  Activity recognition with smartphone sensors , 2014 .

[167]  Michel Mouchart,et al.  Structural Modelling, Exogeneity, and Causality , 2009 .

[168]  H. Eugene Stanley,et al.  Quantifying Wikipedia Usage Patterns Before Stock Market Moves , 2013, Scientific Reports.