Exploring optimization and caching for efficient collection operations

Many large programs operate on collection types. Extensive libraries are available in many programming languages, such as the C++ Standard Template Library, which make programming with collections convenient. Extending programming languages to provide collection queries as first class constructs in the language would not only allow programmers to write queries explicitly in their programs but it would also allow compilers to leverage the wealth of experience available from the database domain to optimize such queries. This paper describes an approach to reduce the run time of programs involving explicit collection queries by performing run time query optimization that is effective for single runs of a program. In addition, it also leverages a cache to store previously computed results. The proposed approach relies on histograms built from the data at run time to estimate the selectivity of joins and predicates in order to construct query plans. Information from earlier executions of the same query during run time is leveraged during the construction of the query plans, even when the data has changed between these executions. An effective cache policy is also determined for caching the results of join (sub) queries. The cache is maintained incrementally, when the underlying collections change, and use of the cache space is optimized by a cache replacement policy. Our approach has been implemented within the Java Query Language (JQL) framework using AspectJ. Our approach demonstrated that its run time query optimization in integration with caching sub query result significantly improves the run time of programs with explicit queries over equivalent programs performing collection operations by iterating over those collections. This paper evaluates our approach using synthetic as well as real world Robocode programs by comparing it to JQL as a benchmark. Experimental results show that our approach performs better than the JQL approach with respect to the program run time.

[1]  James Noble,et al.  Caching and incrementalisation in the java query language , 2008, OOPSLA.

[2]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[3]  Özgür Ulusoy,et al.  Static query result caching revisited , 2008, WWW.

[4]  Guy E. Blelloch,et al.  An experimental analysis of self-adjusting computation , 2009 .

[5]  Mohamed Ziauddin,et al.  Query processing and optimization in Oracle Rdb , 1996, The VLDB Journal.

[6]  Guido Moerkotte,et al.  Heuristic and randomized optimization for the join ordering problem , 1997, The VLDB Journal.

[7]  Jennifer Widom,et al.  Making views self-maintainable for data warehousing , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[8]  Peter Bodorik,et al.  A self-managed predicate-based cache , 2005, 3rd Annual Communication Networks and Services Research Conference (CNSR'05).

[9]  Donald Kossmann,et al.  Iterative dynamic programming: a new class of query optimization algorithms , 2000, TODS.

[10]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[11]  Jennifer Widom,et al.  Adaptive caching for continuous queries , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[13]  Thomas Weigert,et al.  Performance Improvement for Collection Operations Using Join Query Optimization , 2011, 2011 IEEE 35th Annual Computer Software and Applications Conference.

[14]  Yannis E. Ioannidis,et al.  Query optimization , 1996, CSUR.

[15]  David J. DeWitt,et al.  Progressive Parametric Query Optimization , 2009, IEEE Transactions on Knowledge and Data Engineering.

[16]  Surajit Chaudhuri,et al.  Self-tuning histograms: building histograms without looking at data , 1999, SIGMOD '99.

[17]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[18]  Luping Ding,et al.  Dynamic Materialized Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[20]  Joseph Y. Halpern,et al.  Least expected cost query optimization: an exercise in utility , 1999, PODS.

[21]  Arthur M. Keller,et al.  A predicate-based caching scheme for client-server database architectures , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[22]  Thomas Weigert,et al.  Exploring caching for efficient collection operations , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[23]  Darren Willis,et al.  The Java Query Language , 2008 .

[24]  James Noble,et al.  Efficient Object Querying for Java , 2006, ECOOP.

[25]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[26]  Nick Roussopoulos,et al.  An incremental access method for ViewCache: concept, algorithms, and cost analysis , 1991, TODS.

[27]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[28]  Thomas Weigert,et al.  Exploring Query Optimization in Programming Codes by Reducing Run-Time Execution , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference.

[29]  Ben Taskar,et al.  Selectivity estimation using probabilistic models , 2001, SIGMOD '01.

[30]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[31]  Geoffrey Coulson,et al.  IFIP/ACM International Conference on Distributed systems platforms , 2000 .

[32]  Krithi Ramamritham,et al.  Materialized view selection and maintenance using multi-query optimization , 2000, SIGMOD '01.

[33]  Kevin D. Seppi,et al.  A Bayesian Approach to Database Query Optimization , 1993, INFORMS J. Comput..

[34]  Christos Faloutsos,et al.  Proceedings of the 1999 ACM SIGMOD international conference on Management of data , 1999, SIGMOD 1999.

[35]  Umut A. Acar,et al.  Imperative self-adjusting computation , 2008, POPL '08.

[36]  Richard T. Snodgrass,et al.  Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data : SIGMOD '94, Minneapolis, Minnesota, May 24-27, 1994 , 1994, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[37]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[38]  Nick Roussopoulos,et al.  View indexing in relational databases , 1982, TODS.

[39]  Xiaolei Qian,et al.  Query folding , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[40]  Surendra Byna,et al.  Data access history cache and associated data prefetching mechanisms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[41]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[42]  Arun Iyengar,et al.  A Middleware System Which Intelligently Caches Query Results , 2000, Middleware.

[43]  Becky Verastegui,et al.  Proceedings of the 2007 ACM/IEEE conference on Supercomputing , 2007, HiPC 2007.

[44]  Boris Chidlovskii,et al.  Semantic caching of Web queries , 2000, The VLDB Journal.

[45]  Jennifer Widom,et al.  Performance Issues in Incremental Warehouse Maintenance , 2000, VLDB.

[46]  Shlomo Moran,et al.  Predictive caching and prefetching of query results in search engines , 2003, WWW '03.

[47]  Özgür Ulusoy,et al.  Cost-Aware Strategies for Query Result Caching in Web Search Engines , 2011, TWEB.

[48]  G. Antoshenkov,et al.  Dynamic query optimization in Rdb/VMS , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[49]  Brian Beckman,et al.  LINQ: reconciling object, relations and XML in the .NET framework , 2006, SIGMOD Conference.

[50]  Jennifer Widom,et al.  Proceedings of the 1996 ACM SIGMOD international conference on Management of data , 1996, PODS 1996.

[51]  Ambuj K. Singh,et al.  Query-based debugging of object-oriented programs , 1997, OOPSLA '97.

[52]  Kenneth A. Ross,et al.  Materialized view maintenance and integrity constraint checking: trading space for time , 1996, SIGMOD '96.

[53]  Jeffrey F. Naughton,et al.  Query execution techniques for caching expensive methods , 1996, SIGMOD '96.

[54]  Yossi Matias,et al.  Fast incremental maintenance of approximate histograms , 1997, TODS.

[55]  Richard L. Cole A Decision Theoretic Cost Model for Dynamic Plans , 2000, IEEE Data Eng. Bull..

[56]  Marilyn Wolf,et al.  Effective caching of Web objects using Zipf's law , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).