Agrios : A Hybrid Approach to Scalable Data Analysis Systems

At the heart of Agrios lies Bonneville, an extension of the Columbia database optimizer. Bonneville utilizes Columbia’s methods for exploring the search space, but differs in several ways. Bonneville: is designed for use with an array data model. As such, the transformations, rules, etc., that guide exploration of the search space differ considers the location of data objects in assigning costs to plans

[1]  Carlos Maltzahn,et al.  SciHadoop: Array-based query processing in Hadoop , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[2]  N. Matloff The Art of R Programming: A Tour of Statistical Software Design , 2011 .

[3]  Wolfgang Lehner,et al.  Bridging two worlds with RICE , 2011, Proc. VLDB Endow..

[4]  Kamesh Munagala,et al.  Storing matrices on disk , 2011, Proc. VLDB Endow..

[5]  Magdalena Balazinska,et al.  ArrayStore: a storage manager for complex parallel array processing , 2011, SIGMOD '11.

[6]  Gerd Heber,et al.  An overview of the HDF5 technology suite and its applications , 2011, AD '11.

[7]  Ying Zhang,et al.  SciQL, a query language for science applications , 2010, AD '11.

[8]  Peter J. Haas,et al.  Ricardo: integrating R and Hadoop , 2010, SIGMOD Conference.

[9]  Weiping Zhang,et al.  I/O-efficient statistical computing with RIOT , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[10]  Philip J. Guo,et al.  Towards Practical Incremental Recomputation for Scientists: An Implementation for the Python Language , 2010, TaPP.

[11]  Joseph Adler,et al.  R in a Nutshell , 2010 .

[12]  H. Herodotou,et al.  RIOT: I/O-Efficient Numerical Computing without SQL , 2009, CIDR.

[13]  A. R. van Ballegooij,et al.  RAM: Array Database Management through Relational Mapping , 2009 .

[14]  Michael Stonebraker,et al.  A Demonstration of SciDB: A Science-Oriented DBMS , 2009, Proc. VLDB Endow..

[15]  Jacek Becla,et al.  Report from the SciDB Workshop , 2008, Data Sci. J..

[16]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[17]  Michael Stonebraker,et al.  One Size Fits All? Part 2: Benchmarking Studies , 2007, CIDR.

[18]  Michele Weiland,et al.  Chapel , Fortress and X10 : novel languages for HPC , 2007 .

[19]  Hong Su,et al.  Cost-based query transformation in Oracle , 2006, VLDB.

[20]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[21]  Martin L. Kersten,et al.  Distribution Rules for Array Database Queries , 2005, DEXA.

[22]  Nagwa M. El-Makky,et al.  Multilevel chunking of multidimensional arrays , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[23]  Alex van Ballegooij,et al.  A case study on array query optimisation , 2004, CVDB '04.

[24]  Dennis Shasha,et al.  AQuery: Query Language for Ordered Data, Optimization Techniques, and Experiments , 2003, VLDB.

[25]  Yu Zhang,et al.  Exploiting upper and lower bounds in top-down query optimization , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[26]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[27]  Kenneth Salem,et al.  Query processing techniques for arrays , 1999, SIGMOD '99.

[28]  Kenneth Salem,et al.  A Language for Manipulating Arrays , 1997, VLDB.

[29]  Martin L. Kersten,et al.  The Complexity of Transformation-Based Join Enumeration , 1997, VLDB.

[30]  Martin L. Kersten,et al.  Duplicate-Free Generation of Alternatives in Transformation-Based Optimizers , 1997, DASFAA.

[31]  Peter Baumann,et al.  The RasDaMan approach to multidimensional database management , 1997, SAC '97.

[32]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[33]  Björn Þór Jónsson,et al.  Performance tradeoffs for client-server query processing , 1996, SIGMOD '96.

[34]  Marianne Winslett,et al.  Physical schemas for large multidimensional arrays in scientific computing applications , 1994, Seventh International Working Conference on Scientific and Statistical Database Management.

[35]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[36]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[37]  D. Maier,et al.  A call to order , 1993, PODS.

[38]  I. G. Angus,et al.  Image algebra: an object oriented approach to transparently concurrent image processing , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[39]  Guy M. Lohman,et al.  Measuring the Complexity of Join Enumeration in Query Optimization , 1990, VLDB.

[40]  J. N. Wilson,et al.  Image Algebra: An Overview , 1990, Comput. Vis. Graph. Image Process..

[41]  Thomas W. Crockett,et al.  File concepts for parallel I/O , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[42]  Arun N. Swami,et al.  Optimization of large join queries: combining heuristics and combinatorial techniques , 1989, SIGMOD '89.

[43]  David J. DeWitt,et al.  The EXODUS optimizer generator , 1987, SIGMOD '87.

[44]  David S. Wise Matrix algebra and applicative programming , 1987, FPCA.

[45]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[46]  Trenchard More,et al.  Rectangularly arranged collections of collections , 1982, APL '82.

[47]  T. G. Price,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[48]  Trenchard More The nested rectangular array as a model of data , 1979, APL '79.

[49]  Ziad J. Ghandour,et al.  General Arrays, Operators and Functions , 1973, IBM J. Res. Dev..

[50]  Michael Stonebraker,et al.  Requirements for Science Data Bases and SciDB , 2009, CIDR.

[51]  Daniel J. Abadi,et al.  Column Stores for Wide and Sparse Data , 2007, CIDR.

[52]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[53]  David Maier,et al.  Distributed queries without distributed state , 2002, WebDB.

[54]  G. Graefe The Cascades Framework for Query Optimization. , 1995 .