On quantifying the quality of information in social sensing

This thesis develops the fundamental theory and methodology for quantifying the Quality of Information (QoI) in social sensing. We refer social sensing to the sensing applications where humans play a critical role in the sensing or data collection process. Social sensing has emerged as a new paradigm for sensory data collection, which is motivated by the proliferation of mobile platforms equipped with a variety of sensors (e.g., GPS, camera, microphone, motion and etc.) in the possession of common individuals, networking capabilities that enable fast and convenient data sharing (e.g., WiFi and 4G) and large-scale dissemination of opportunities (Twitter, Flicker and etc.). A significant challenge in social sensing applications lies in ascertaining the correctness of collected data and the reliability of information sources. We call this challenge QoI quantification in social sensing. Unlike the case with well-calibrated and well-tested infrastructure sensors, humans are less reliable. The term, participant (or source) reliability is used to denote the probability that the participant reports correct observations. Reliability may be impaired because of poor used sensor quality, lack of sensor calibration, lack of (human) attention to the task, or even intent to deceive. Moreover, data collection is often open to a large population, where it is impossible to screen all participants beforehand. The likelihood that a participant’s measurements are correct is usually unknown a priori. Consequently, it is very challenging to ascertain the correctness of the collected data from unreliable sources with unknown reliability. Meanwhile, it is also challenging to ascertain the reliability of each information source without knowing whether their collected data are true or not. Therefore, the main questions posed in this thesis are: i) whether or not we can determine, in an optimal way, given only the measurements collected and without knowing the reliability of sources, which of the reported observations are true and which are not? ii) whether a source (participant) is reliable or not? iii) how to quantify the answers of the above questions? The thesis answered the above questions by applying the key insights from estimation theory and data fusion to come up with new theories to accurately quantify both the participant reliability and correctness

[1]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[2]  Mirco Musolesi,et al.  The Rise of People-Centric Sensing , 2008, IEEE Internet Comput..

[3]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[4]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[5]  Marco Janssen,et al.  Diffusion dynamics in small-world networks with heterogeneous consumers , 2007, Comput. Math. Organ. Theory.

[6]  Lance Kaplan,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, 2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN).

[7]  Charu C. Aggarwal,et al.  Integrating Sensors and Social Networks , 2011, Social Network Data Analytics.

[8]  Deborah Estrin,et al.  Recruitment Framework for Participatory Sensing Data Collections , 2010, Pervasive.

[9]  Michael Kaminsky,et al.  SybilGuard: defending against sybil attacks via social networks , 2006, SIGCOMM.

[10]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Mehrdad Jalali,et al.  Expectation maximization clustering algorithm for user modeling in web usage mining system , 2009 .

[12]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[13]  D. Titterington Recursive Parameter Estimation Using Incomplete Data , 1984 .

[14]  D. Cook,et al.  Smart Home-Based Health Platform for Behavioral Monitoring and Alteration of Diabetes Patients , 2009, Journal of diabetes science and technology.

[15]  Suman Nath,et al.  Privacy-aware regression modeling of participatory sensing data , 2010, SenSys '10.

[16]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[17]  Landon P. Cox,et al.  LiveCompare: grocery bargain hunting through participatory sensing , 2009, HotMobile '09.

[18]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[19]  Mirco Musolesi,et al.  Urban sensing systems: opportunistic or participatory? , 2008, HotMobile '08.

[20]  Jiawei Han,et al.  The Sparse Regression Cube: A Reliable Modeling Technique for Open Cyber-Physical Systems , 2011, 2011 IEEE/ACM Second International Conference on Cyber-Physical Systems.

[21]  Gregory Dudek,et al.  Context Dependent Movie Recommendations Using a Hierarchical Bayesian Model , 2009, Canadian Conference on AI.

[22]  Deborah Estrin,et al.  Improving activity classification for health applications on mobile devices using active and semi-supervised learning , 2010, 2010 4th International Conference on Pervasive Computing Technologies for Healthcare.

[23]  Gediminas Adomavicius,et al.  New Recommendation Techniques for Multicriteria Rating Systems , 2007, IEEE Intelligent Systems.

[24]  Cristina Nita-Rotaru,et al.  A survey of attack and defense techniques for reputation systems , 2009, CSUR.

[25]  Mirco Musolesi,et al.  Sensing meets mobile social networks: the design, implementation and evaluation of the CenceMe application , 2008, SenSys '08.

[26]  Deborah Estrin,et al.  Biketastic: sensing and mapping for better biking , 2010, CHI.

[27]  Md. Yusuf Sarwar Uddin,et al.  Demo: Distilling likely truth from noisy streaming data with Apollo , 2011, SenSys.

[28]  David Wetherall,et al.  Toward trustworthy mobile sensing , 2010, HotMobile '10.

[29]  Lawrence B. Holder,et al.  Sensor selection to support practical use of health‐monitoring smart environments , 2011 .

[30]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[31]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[32]  Wen Hu,et al.  Towards trustworthy participatory sensing , 2009 .

[33]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[34]  Suman Nath,et al.  Privacy-Preserving Reconstruction of Multidimensional Data Maps in Vehicular Participatory Sensing , 2010, EWSN.

[35]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[36]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[37]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[38]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[39]  Robert V. Hogg,et al.  Introduction to Mathematical Statistics. , 1966 .

[40]  Emiliano Miluzzo,et al.  The BikeNet mobile sensing system for cyclist experience mapping , 2007, SenSys '07.

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[42]  Yang Zhang,et al.  CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[43]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[44]  Charu C. Aggarwal,et al.  On Quantifying the Accuracy of Maximum Likelihood Estimation of Participant Reliability in Social Sensing , 2011 .

[45]  Alex Pentland,et al.  Social sensing: obesity, unhealthy eating and exercise in face-to-face networks , 2010, Wireless Health.

[46]  Deborah Estrin,et al.  PEIR, the personal environmental impact report, as a platform for participatory sensing systems research , 2009, MobiSys '09.

[47]  Inseok Hwang,et al.  E-Gesture: a collaborative architecture for energy-efficient gesture recognition with hand-worn sensor and mobile devices , 2011, SenSys.

[48]  Shivakant Mishra,et al.  CenWits: a sensor-based loosely coupled search and rescue system using witnesses , 2005, SenSys '05.

[49]  J. Wooders,et al.  Reputation in Auctions: Theory, and Evidence from Ebay , 2006 .

[50]  Soundararajan Srinivasan,et al.  Multisensor Fusion in Smartphones for Lifestyle Monitoring , 2010, 2010 International Conference on Body Sensor Networks.

[51]  Leonidas J. Guibas,et al.  Mobiscopes for Human Spaces , 2007, IEEE Pervasive Computing.

[52]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[53]  Herbert J. Mattord,et al.  Principles of Information Security , 2004 .

[54]  Alex Pentland,et al.  Social sensing for epidemiological behavior change , 2010, UbiComp.

[55]  Charu C. Aggarwal,et al.  On Bayesian interpretation of fact-finding in information networks , 2011, 14th International Conference on Information Fusion.

[56]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[57]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[58]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[59]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.