Columnar objects: improving the performance of analytical applications

Growing volumes of data increase the demand to use it in analytical applications to make informed decisions. Unfortunately, object-oriented runtimes experience performance problems when dealing with large data volumes. Similar problems have been addressed by column-oriented in-memory databases, whose memory layout is tailored to analytical workloads. As a result, data storage and processing are often delegated to such a database. However, the more domain logic is moved to this separate system, the more benefits of object-orientation are lost. We propose modifications to dynamic object-oriented runtimes to store collections of objects in a column-oriented memory layout and leverage a jit to take advantage of the adjusted layout by mapping object traversal to array operations. We implemented our concept in PyPy, a Python interpreter equipped with a tracing jit. Finally, we show that analytical algorithms, expressed through object-oriented code, are up to three times faster due to our optimizations, without substantially impairing the paradigm. Hopefully, extending these concepts will mitigate some problems originating from the paradigm mismatch between object-oriented runtimes and databases.

[1]  Ted Neward The Vietnam of Computer Science , 2006 .

[2]  Alan Borning,et al.  Exploding java objects for performance , 2003 .

[3]  大島 芳樹 An end-user programming system for constructing massively parallel simulations , 2006 .

[4]  Remigius Meier,et al.  A way forward in parallelising dynamic languages , 2014, ICOOOLPS@ECOOP.

[5]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[6]  Carl Friedrich Bolz,et al.  Tracing the meta-level: PyPy's tracing JIT compiler , 2009, ICOOOLPS@ECOOP.

[7]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[8]  Hasso Plattner A Course in In-Memory Data Management , 2013 .

[9]  David Maier,et al.  Development of an object-oriented DBMS , 1986, OOPSLA 1986.

[10]  KyungMann Kim,et al.  Contrasting treatment‐specific survival using double‐robust estimators , 2012 .

[11]  Stéphane Ducasse,et al.  OOPAL: integrating array programming in object-oriented programming , 2003, OOPSLA 2003.

[12]  Hasso Plattner,et al.  A Course in In-Memory Data Management: The Inner Mechanics of In-Memory Databases , 2013 .

[13]  David Robson,et al.  Smalltalk-80: The Language and Its Implementation , 1983 .

[14]  Michael Leuschel,et al.  Allocation removal by partial evaluation in a tracing JIT , 2011, PEPM '11.

[15]  Craig Chambers,et al.  An efficient implementation of SELF, a dynamically-typed object-oriented language based on prototypes , 1991 .

[16]  Hasso Plattner,et al.  SanssouciDB: An In-Memory Database for Processing Enterprise Workloads , 2011, BTW.

[17]  Setrag Khoshafian,et al.  Object identity , 1986, OOPLSA '86.

[18]  Samuele Pedroni,et al.  PyPy's approach to virtual machine construction , 2006, OOPSLA '06.

[19]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[20]  A. Elo The rating of chessplayers, past and present , 1978 .

[21]  John Maloney,et al.  Back to the Future The Story of Squeak, A Practical Smalltalk Written in Itself , 1997 .