Caching and Materialization for Web Databases

Database systems have been driving dynamic websites since the early 1990s; nowadays, even seemingly static websites employ a database back-end for personalization and advertising purposes. In order to keep up with the high demand fuelled by the rapid growth of the Internet, a number of caching and materialization techniques have been proposed for web databases over the years. The main goal of these techniques is to improve performance, scalability, and manageability of database-driven dynamic websites, in a way that the quality of data is not compromised. Although caching and materialization are well-understood concepts in the traditional database and networking/operating systems literature, the Web and web databases bring forth unique characteristics that warrant new techniques and approaches. In this monograph, we adopt a data management point of view to describe the system architectures of web databases, and analyze the research issues related to caching and materialization in such architectures. We also present the state-of-the-art in caching and materialization for web databases and organize current approaches according to the fundamental questions, namely how to store, how to use, and how to maintain cached/materialized web data. Finally, we associate work in caching and materialization for web databases to similar techniques in other related areas, such as data warehousing, distributed systems, and distributed databases.

[1]  Qiong Luo,et al.  Template-Based Proxy Caching for Table-Valued Functions , 2004, DASFAA.

[2]  Rudolf Bayer,et al.  A database cache for high performance and fast restart in database systems , 1984, TODS.

[3]  Miron Livny,et al.  Transactional client-server cache consistency: alternatives and performance , 1997, TODS.

[4]  Edith Cohen,et al.  Aging through cascaded caches: performance issues in the distribution of web content , 2001, SIGCOMM.

[5]  Sriram Padmanabhan,et al.  Scalable template-based query containment checking for Web semantic caches , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  Alexandros Labrinidis,et al.  WebView materialization , 2000, SIGMOD '00.

[7]  Adam Dingle,et al.  Web Cache Coherence , 1996, Comput. Networks.

[8]  Xueyan Tang,et al.  Coordinated management of cascaded caches for efficient content distribution , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[9]  Martin Arlitt,et al.  The Distributed Object Consistency Protocol Version 1.0 , 1999 .

[10]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.

[11]  Yan Jenny Liu Performance and Scalability Measurement of COTS EJB Technology , 2002, SBAC-PAD.

[12]  Vincent Cate,et al.  Alex - a Global Filesystem , 1992 .

[13]  John C. Grundy,et al.  Extending a Persistent Object Framework to Enhance Enterprise Application Server Performance , 2002, Australasian Database Conference.

[14]  Michael J. Franklin,et al.  Cache investment: integrating query optimization and distributed data placement , 2000, TODS.

[15]  Daniel Andresen,et al.  Towards a hierarchical scheduling system for distributed WWW server clusters , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[16]  Herman Lam,et al.  An Internet-based negotiation server for e-commerce , 2001, The VLDB Journal.

[17]  Arun Iyengar,et al.  A scalable system for consistently caching dynamic Web data , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[18]  Jennifer Widom,et al.  Deriving Production Rules for Incremental View Maintenance , 1991, VLDB.

[19]  Vlad Ingar Wietrzyk,et al.  Real-Time Transaction Scheduling in Database Systems , 1996, DEXA.

[20]  Jennifer Widom,et al.  Best-effort cache synchronization with source cooperation , 2002, SIGMOD '02.

[21]  Lei Gao,et al.  Improving Availability and Performance with Application-Specific Data Replication , 2004 .

[22]  Lakshmish Ramaswamy,et al.  Efficient Formation of Edge Cache Groups for Dynamic Content Delivery , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[23]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[24]  Sriram Padmanabhan,et al.  DBProxy: a dynamic data cache for web applications , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[25]  Prashant J. Shenoy,et al.  Maintaining Coherency of Dynamic Data in Cooperating Repositories , 2002, VLDB.

[26]  Jonathan Goldstein,et al.  Relaxed currency and consistency: how to say "good enough" in SQL , 2004, SIGMOD '04.

[27]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[28]  Serge Abiteboul,et al.  Incremental Maintenance for Materialized Views over Semistructured Data , 1998, VLDB.

[29]  Kuo-Ming Chao,et al.  Architecture of an agent-based negotiation mechanism , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[30]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[31]  Patrick C. K. Hung,et al.  WS-Negotiation: an overview of research issues , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[32]  Jun'ichi Tatemura,et al.  Incremental maintenance of path-expression views , 2005, SIGMOD '05.

[33]  Per-Åke Larson,et al.  Computing Queries from Derived Relations , 1985, VLDB.

[34]  Tao Yang,et al.  Exploiting Result Equivalence in Caching Dynamic Web Content , 1999, USENIX Symposium on Internet Technologies and Systems.

[35]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[36]  Jia Wang,et al.  A survey of web caching schemes for the Internet , 1999, CCRV.

[37]  Lakshmish Ramaswamy,et al.  An expiration age-based document placement scheme for cooperative Web caching , 2004, IEEE Transactions on Knowledge and Data Engineering.

[38]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[39]  Heiko Ludwig,et al.  The WSLA Framework: Specifying and Monitoring Service Level Agreements for Web Services , 2003, Journal of Network and Systems Management.

[40]  Azer Bestavros,et al.  Demand-based document dissemination to reduce traffic and balance load in distributed information systems , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[41]  Chengjie Liu,et al.  Maintaining Strong Cache Consistency in the World Wide Web , 1998, IEEE Trans. Computers.

[42]  Tomasz Imielinski,et al.  Sleepers and workaholics: caching strategies in mobile environments , 1994, SIGMOD '94.

[43]  Jeffrey F. Naughton,et al.  Query execution techniques for caching expensive methods , 1996, SIGMOD '96.

[44]  Wei Sun,et al.  Workload-aware load balancing for clustered Web servers , 2005, IEEE Transactions on Parallel and Distributed Systems.

[45]  P. Couvares Caching in the Sprite network file system , 2006 .

[46]  Suresha,et al.  Proxy-based acceleration of dynamically generated content on the world wide web: an approach and implementation , 2002, SIGMOD '02.

[47]  Patrick Martin,et al.  A Policy-Based Middleware for Web Services SLA Negotiation , 2009, 2009 IEEE International Conference on Web Services.

[48]  Leonid Libkin,et al.  Incremental maintenance of views with duplicates , 1995, SIGMOD '95.

[49]  Edith Cohen,et al.  Improving end-to-end performance of the Web using server volumes and proxy filters , 1998, SIGCOMM '98.

[50]  Jeffrey F. Naughton,et al.  Active Query Caching for Database Web Servers , 2000, WebDB.

[51]  Times-Ten Team Mid-tier caching: the TimesTen approach , 2002, SIGMOD '02.

[52]  Hector Garcia-Molina,et al.  Applying update streams in a soft real-time database system , 1995, SIGMOD '95.

[53]  Alberto O. Mendelzon,et al.  Optimizing incremental view maintenance expressions in relational databases , 1997 .

[54]  David Schuff,et al.  Managing your total IT cost of ownership , 2002, CACM.

[55]  Leonid Libkin,et al.  An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions , 1997, IEEE Trans. Knowl. Data Eng..

[56]  Kurt Jeffery Worrell Invalidation in Large Scale Network Object Caches , 1994 .

[57]  Charles Elkan,et al.  Independence of logic database queries and update , 1990, PODS '90.

[58]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[59]  Inderpal Singh Mumick,et al.  Counting solutions to the View Maintenance Problem , 1992, Workshop on Deductive Databases, JICSLP.

[60]  Michael Stonebraker,et al.  Readings in Database Systems , 1988 .

[61]  Alec Wolman,et al.  Organization-Based Analysis of Web-Object Sharing and Caching , 1999, USENIX Symposium on Internet Technologies and Systems.

[62]  Ahmed K. Elmagarmid,et al.  Bit-Sequences: An adaptive cache invalidation method in mobile client/server environments , 1997, Mob. Networks Appl..

[63]  Prashant J. Shenoy,et al.  Adaptive push-pull: disseminating dynamic web data , 2001, WWW '01.

[64]  Arthur M. Keller,et al.  A predicate-based caching scheme for client-server database architectures , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[65]  Kevin Wilkinson,et al.  Maintaining Consistency of Client-Cached Data , 1990, VLDB.

[66]  Chuang Lin,et al.  Session-affinity aware request allocation for Web clusters , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[67]  Michael Stonebraker,et al.  Mariposa: a wide-area distributed database system , 1996, The VLDB Journal.

[68]  Xiang Liu,et al.  Web caching for database applications with Oracle Web Cache , 2002, SIGMOD '02.

[69]  Divyakant Agrawal,et al.  Freshness-driven adaptive caching for dynamic content Web sites , 2003, Data Knowl. Eng..

[70]  Anja Feldmann,et al.  Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.

[71]  Samuel Kounev,et al.  Performance tuning and optimization of J2EE applications on the JBoss platform , 2004 .

[72]  Dongwon Lee,et al.  Semantic caching via query matching for web sources , 1999, CIKM '99.

[73]  Herman Lam,et al.  On automated e‐business negotiations: Goal, policy, strategy, and plans of decision and action , 2006, J. Organ. Comput. Electron. Commer..

[74]  B Praveen Kumar,et al.  Mariposa a Wide-Area Distributed Database System , 2010, ICCA 2010.

[75]  Sun Wu,et al.  Virtual proxy servers for WWW and intelligent agents on the Internet , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[76]  Michael Dahlin,et al.  Volume Leases for Consistency in Large-Scale Systems , 1999, IEEE Trans. Knowl. Data Eng..

[77]  Balachander Krishnamurthy,et al.  Piggyback Server Invalidation for Proxy Cache Coherency , 1998, Comput. Networks.

[78]  Alon Y. Halevy,et al.  Queries Independent of Updates , 1993, VLDB.

[79]  Yue Zhuge,et al.  Graph structured views and their incremental maintenance , 1998, Proceedings 14th International Conference on Data Engineering.

[80]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[81]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[82]  Mark Nottingham Optimizing Object Freshness Controls in Web Caches , 1999 .

[83]  Amin Vahdat,et al.  Efficient Numerical Error Bounding for Replicated Network Services , 2000, VLDB.

[84]  Eric N. Hanson,et al.  A performance analysis of view materialization strategies , 1987, SIGMOD '87.

[85]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM.

[86]  Henry M. Levy,et al.  Sharing and caching characteristics of internet content , 2002 .

[87]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[88]  Jeffrey D. Ullman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS '95.

[89]  Jianliang Xu,et al.  Proactive caching for spatial queries in mobile environments , 2005, 21st International Conference on Data Engineering (ICDE'05).

[90]  Margo I. Seltzer,et al.  World Wide Web Cache Consistency , 1996, USENIX Annual Technical Conference.

[91]  Jonathan Goldstein,et al.  Optimizing queries using materialized views: a practical, scalable solution , 2001, SIGMOD '01.

[92]  Kirk Pruhs,et al.  Algorithms and metrics for processing multiple heterogeneous continuous queries , 2008, TODS.

[93]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[94]  Bruce M. Maggs,et al.  Globally Distributed Content Delivery , 2002, IEEE Internet Comput..

[95]  Philip A. Bernstein,et al.  Relaxed-currency serializability for middle-tier caching and replication , 2006, SIGMOD Conference.

[96]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[97]  Joel Wein,et al.  ACMS: the Akamai configuration management system , 2005, NSDI.

[98]  Y. Charlie Hu,et al.  Transparent query caching in peer-to-peer overlay networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[99]  Alexandros Labrinidis,et al.  Adaptive WebView Materialization , 2001, WebDB.

[100]  Mohan Kumar,et al.  Quality of Service Issues in Internet Web Services , 2002, IEEE Trans. Computers.

[101]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[102]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[103]  Alexandros Labrinidis,et al.  Exploring the tradeoff between performance and data freshness in database-driven Web servers , 2004, The VLDB Journal.

[104]  Alexandros Labrinidis,et al.  Update Propagation Strategies for Improving the Quality of Data on the Web , 2001, VLDB.

[105]  Jeffrey F. Naughton,et al.  Middle-tier database caching for e-business , 2002, SIGMOD '02.

[106]  Cinzia Cappiello,et al.  On Automated Generation of Web Service Level Agreements , 2007, CAiSE.

[107]  Jie Xu,et al.  Quality Contracts for Real-Time Enterprises , 2006, BIRTE.

[108]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[109]  Minwen Ji,et al.  Affinity-based management of main memory database clusters , 2002, TOIT.

[110]  Louiqa Raschid,et al.  Using Latency-Recency Profiles for Data Delivery on the Web , 2002, VLDB.

[111]  Barron C. Housel,et al.  WebExpress: a system for optimizing Web browsing in a wireless environment , 1996, MobiCom '96.

[112]  Gabi Dreo Rodosek,et al.  Dynamic Service Provisioning: A User-Centric Approach , 2001, DSOM.

[113]  Lakshmish Ramaswamy,et al.  Automatic fragment detection in dynamic Web pages and its impact on caching , 2005, IEEE Transactions on Knowledge and Data Engineering.

[114]  Alexandros Labrinidis Web Views , 2009, Encyclopedia of Database Systems.

[115]  Prashant J. Shenoy,et al.  Adaptive leases: a strong consistency mechanism for the World Wide Web , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[116]  Gio Wiederhold,et al.  Incremental Recomputation of Active Relational Expressions , 1991, IEEE Trans. Knowl. Data Eng..

[117]  Jonathan Goldstein,et al.  MTCache: transparent mid-tier database caching in SQL server , 2004, Proceedings. 20th International Conference on Data Engineering.

[118]  Antony I. T. Rowstron,et al.  Squirrel: a decentralized peer-to-peer web cache , 2002, PODC '02.

[119]  Li Xiao,et al.  Building a large and efficient hybrid peer-to-peer Internet caching system , 2004, IEEE Transactions on Knowledge and Data Engineering.

[120]  Sang Hyuk Son,et al.  Managing deadline miss ratio and sensor data freshness in real-time databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[121]  Hamid Pirahesh,et al.  A Framework for Using Materialized XPath Views in XML Query Processing , 2004, VLDB.

[122]  Divyakant Agrawal,et al.  Enabling dynamic content caching for database-driven web sites , 2001, SIGMOD '01.

[123]  Pablo Rodriguez,et al.  Analysis of web caching architectures: hierarchical and distributed caching , 2001, TNET.

[124]  Balachander Krishnamurthy,et al.  Study of Piggyback Cache Validation for Proxy Caches in the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.

[125]  Fred Douglis,et al.  Optimistic deltas for WWW latency reduction , 1997 .

[126]  Boris Chidlovskii,et al.  Semantic Cache Mechanism for Heterogeneous Web Querying , 1999, Comput. Networks.

[127]  Jin Zhang,et al.  Active Cache: caching dynamic contents on the Web , 1999, Distributed Syst. Eng..

[128]  Hamid Pirahesh,et al.  Cache Tables: Paving the Way for an Adaptive Database Cache , 2003, VLDB.

[129]  David R. Cheriton,et al.  Scalable Web Caching of Frequently Updated Objects Using Reliable Multicast , 1999, USENIX Symposium on Internet Technologies and Systems.

[130]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[131]  Kirk Pruhs,et al.  Adaptive Scheduling of Web Transactions , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[132]  Miron Livny,et al.  Earliest deadline scheduling for real-time database systems , 1991, [1991] Proceedings Twelfth Real-Time Systems Symposium.

[133]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[134]  Peter Triantafillou,et al.  Achieving Strong Consistency in a Distributed File System , 1997, IEEE Trans. Software Eng..

[135]  Wenwei Xue,et al.  Form-based proxy caching for database-backed web sites: keywords and functions , 2006, The VLDB Journal.

[136]  Lawrence A. Rowe,et al.  Cache consistency and concurrency control in a client/server DBMS architecture , 1991, SIGMOD '91.

[137]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[138]  Mor Harchol-Balter,et al.  Web servers under overload: How scheduling can help , 2006, TOIT.

[139]  Michael Dahlin,et al.  Using leases to support server-driven consistency in large-scale systems , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[140]  Donald F. Ferguson,et al.  Economic models for allocating resources in computer systems , 1996 .

[141]  Balachander Krishnamurthy,et al.  Proxy cache coherency and replacement-towards a more complete picture , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[142]  Randal C. Burns,et al.  Bypass caching: making scientific databases good network citizens , 2005, 21st International Conference on Data Engineering (ICDE'05).

[143]  Per-Åke Larson,et al.  Updating derived relations: detecting irrelevant and autonomously computable updates , 1986, VLDB.

[144]  N.V. Chawla,et al.  Estimating Query Result Sizes for Proxy Caching in Scientific Database Federations , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[145]  Alexandros Labrinidis,et al.  Preference-Aware Query and Update Scheduling in Web-databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[146]  Ugur Çetintemel,et al.  Efficient distributed precision control in symmetric replication environments , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[147]  Hector Garcia-Molina,et al.  Database Support for Efficiently Maintaining Derived Data , 1996, EDBT.

[148]  Paraskevas Evripidou,et al.  A decade of dynamic web content: a structured survey on past and present practices and future trends , 2006, IEEE Communications Surveys & Tutorials.

[149]  Lakshmish Ramaswamy,et al.  Cache Clouds: Cooperative Caching of Dynamic Documents in Edge Networks , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[150]  Priyanka Jain,et al.  WebSphere Dynamic Cache: Improving J2EE application performance , 2004, IBM Syst. J..

[151]  Panos Kalnis,et al.  Proxy-server architectures for OLAP , 2001, SIGMOD '01.

[152]  Valérie Issarny,et al.  Caching Strategies for Data-Intensive Web Sites , 2000, VLDB.

[153]  Panos Kalnis,et al.  Active caching of on-line-analytical-processing queries in WWW proxies , 2001, International Conference on Parallel Processing, 2001..

[154]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[155]  Anand Rajaraman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS.

[156]  Times-Ten Team In-Memory Data Management in the Application Tier , 2000, ICDE.

[157]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[158]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[159]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[160]  Nick Roussopoulos,et al.  Principles and Techniques in the Design of ADMS± , 1986, Computer.

[161]  Zhen Xiao,et al.  Moving Edge-Side Includes to the Real Edge - the Clients , 2003, USENIX Symposium on Internet Technologies and Systems.

[162]  Arne A. Nilsson,et al.  On service level agreements for IP networks , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[163]  Arun Iyengar,et al.  Data Update Propagation: A Method for Determining How Changes to Underlying Data A ect Cached Objects on the Web , 1998 .

[164]  Mahadev Satyanarayanan,et al.  Categories and Subject Descriptors: D.4.3 [Software]: File Systems Management—Distributed , 2022 .

[165]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.

[166]  Jeffrey C. Mogul Recovery in Spritely NFS , 1994, Comput. Syst..

[167]  Hector Garcia-Molina,et al.  Scheduling real-time transactions: a performance evaluation , 1988, TODS.