Intensional data on the web

We call data intensional when it is not directly available, but must be accessed through a costly interface. Intensional data naturally arises in a number of Web data management scenarios, such as Web crawling or ontology-based data access. Such scenarios require us to model an uncertain view of the world, for which, given a query, we must answer the question "What is the best thing to do next?" Once data has been retrieved, the knowledge of the world is revised, and the whole process is repeated, until enough knowledge about the world has been obtained for the particular application considered. In this article, we give an overview of the steps underlying all intensional data management scenarios, and illustrate them on three concrete applications: focused crawling, online influence maximization in social networks, and mining crowdsourced data.

[1]  Pierre Senellart,et al.  CrowdMiner: Mining association rules from the crowd , 2013, Proc. VLDB Endow..

[2]  Jennifer Widom,et al.  Database systems - the complete book (2. ed.) , 2009 .

[3]  Pierre Senellart,et al.  Crowd mining , 2013, SIGMOD '13.

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[6]  Reynold Cheng,et al.  Online Influence Maximization , 2015, KDD.

[7]  Filippo Menczer,et al.  Evaluating topic-driven web crawlers , 2001, SIGIR '01.

[8]  Stratton C. Jaquette MARKOV DECISION PROCESSES WITH A NEW , 2016 .

[9]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[10]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[11]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[12]  Michael Benedikt,et al.  Monadic Datalog Containment , 2012, ICALP.

[13]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[14]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[15]  Richi Nayak,et al.  Discovering interesting information with advances in web technology , 2012, SKDD.

[16]  Yannis Stavrakas,et al.  Exploiting the Social and Semantic Web for Guided Web Archiving , 2012, TPDL.

[17]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[18]  Reynold Cheng,et al.  On incentive-based tagging , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[19]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[20]  Dan Olteanu,et al.  Conditioning probabilistic databases , 2008, Proc. VLDB Endow..

[21]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[22]  Georg Gottlob,et al.  Determining relevance of accesses at runtime , 2011, PODS.

[23]  Pierre Senellart,et al.  Scalable, generic, and adaptive systems for focused crawling , 2014, HT.

[24]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[25]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[26]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[27]  Raymond Reiter,et al.  Deductive Question-Answering on Relational Data Bases , 1977, Logic and Data Bases.

[28]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[29]  Ohad Greenshpan,et al.  Asking the Right Questions in Crowd Data Sourcing , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[30]  Diego Calvanese,et al.  The MASTRO system for ontology-based data access , 2011, Semantic Web.

[31]  Jeffrey D. Ullman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS '95.

[32]  Valter Crescenzi,et al.  A framework for learning web wrappers from the crowd , 2013, WWW '13.

[33]  Tova Milo,et al.  Uncertainty in Crowd Data Sourcing Under Structural Constraints , 2014, DASFAA Workshops.

[34]  Evgeny Kharlamov,et al.  Updating probabilistic XML , 2010, EDBT '10.

[35]  Michael Benedikt,et al.  Data Cleaning for Decision Support , 2006, CleanDB.

[36]  Pierre Senellart,et al.  Probabilistic XML: Models and Complexity , 2013, Advances in Probabilistic Databases for Uncertain Information Management.

[37]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[38]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[39]  Jennifer Widom,et al.  Human-assisted graph search: it's okay to ask questions , 2011, Proc. VLDB Endow..

[40]  Anand Rajaraman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS.