Towards a Dynamic Linked Data Observatory

We describe work-in-progress on the design and methodology of the Dynamic Linked Data Observatory: a framework to monitor Linked Data over an extended period of time. The core goal of our work is to collect frequent, continuous snapshots of a subset of the Web of Data that is interesting for further study and experimentation, with an aim to capture raw data about the dynamics of Linked Data. The resulting corpora will be made openly and continuously available to the Linked Data research community. Herein, we (1) motivate the importance of such a corpus; (2) outline some of the use-cases and requirements for the resulting snapshots; (3) discuss dierent \views" of the Web of Data that aect how we dene a sample to monitor; (4) detail how we select the scope of the monitoring experiment through sampling, (5) discuss the nal design of the monitoring framework that will gather regular snapshots of (subsets of) the Web of Data over the coming months and years.

[1]  Jürgen Umbrich,et al.  Linked Data and Live Querying for Enabling Support Platforms for Web Dataspaces , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[2]  Sandeep Pandey,et al.  Monitoring the dynamic web to respond to continuous queries , 2003, WWW '03.

[3]  Jürgen Umbrich,et al.  Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources , 2010, LDOW.

[4]  Jeffrey Scott Vitter,et al.  Characterizing Web Document Change , 2001, WAIM.

[5]  Axel Polleres,et al.  OWL: Yet to arrive on the Web of Data? , 2012, LDOW.

[6]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[7]  Tim Bray,et al.  Measuring the Web , 1996, World Wide Web J..

[8]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[9]  Lydia B. Chilton,et al.  Tabulator: Exploring and Analyzing linked data on the Semantic Web , 2006 .

[10]  Carrie Grimes Microscale evolution of web pages , 2008, WWW.

[11]  Giovanni Tummarello,et al.  A Node Indexing Scheme for Web Entity Retrieval , 2010, ESWC.

[12]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[13]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[14]  Jürgen Umbrich,et al.  Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine , 2011, J. Web Semant..

[15]  Jürgen Umbrich,et al.  MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data , 2006, SEMWEB.

[16]  Zhen Liu,et al.  Optimal Robot Scheduling for Web Search Engines , 1998 .

[17]  Diana Maynard,et al.  The Semantic Web Challenge, 2010 , 2011, Journal of Web Semantics.

[18]  Dmitri Loguinov,et al.  IRLbot: Scaling to 6 billion pages and beyond , 2009, TWEB.

[19]  George Cybenko,et al.  Keeping up with the changing Web , 2000, Computer.

[20]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[21]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.

[22]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..

[23]  George Cybenko,et al.  How dynamic is the Web? , 2000, Comput. Networks.

[24]  Alexandre Passant,et al.  sparqlPuSH: Proactive Notification of Data Updates in RDF Stores Using PubSubHubbub , 2010, SFSW.

[25]  Hector Garcia-Molina,et al.  Estimating frequency of change , 2003, TOIT.

[26]  Pierre Senellart,et al.  Deriving Dynamics of Web Pages: A Survey , 2011, TWAW.

[27]  Hector Garcia-Molina,et al.  Effective page refresh policies for Web crawlers , 2003, TODS.

[28]  Bernhard Haslhofer,et al.  DSNotify - A solution for event detection and link maintenance in dynamic datasets , 2011, J. Web Semant..

[29]  Jürgen Umbrich,et al.  Towards Understanding the Changing Web: Mining the Dynamics of Linked-Data Sources and Entities , 2010, LWA.