Hardware killed the software star

Until relatively recently, the development of data processing applications took place largely ignoring the underlying hardware. Only in niche applications (supercomputing, embedded systems) or in special software (operating systems, database internals, language runtimes) did (some) programmers had to pay attention to the actual hardware where the software would run. In most cases, working atop the abstractions provided by either the operating system or by system libraries was good enough. The constant improvements in processor speed did the rest. The new millennium has radically changed the picture. Driven by multiple needs - e.g., scale, physical constraints, energy limitations, virtualization, business models- hardware architectures are changing at a speed and in ways that current development practices for data processing cannot accommodate. From now on, software will have to be developed paying close attention to the underlying hardware and following strict performance engineering principles. In this paper, several aspects of the ongoing hardware revolution and its impact on data processing are analysed, pointing to the need for new strategies to tackle the challenges ahead.

[1]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[2]  Ramarathnam Venkatesan,et al.  Orthogonal Security with Cipherbase , 2013, CIDR.

[3]  Jens Teubner,et al.  How soccer players would do stream joins , 2011, SIGMOD '11.

[4]  Telecommunications Board The Future of Computing Performance: Game Over or Next Level? , 2011 .

[5]  Steven Hand,et al.  The Seven Deadly Sins of Cloud Computing Research , 2012, HotCloud.

[6]  Gustavo Alonso,et al.  COD: Database / Operating System Co-Design , 2013, CIDR.

[7]  Gustavo Alonso,et al.  SharedDB: Killing One Thousand Queries With One Stone , 2012, Proc. VLDB Endow..

[8]  Dilma Da Silva,et al.  Experience distributing objects in an SMMP OS , 2007, TOCS.

[9]  Gustavo Alonso,et al.  Data Processing on FPGAs , 2009, Proc. VLDB Endow..

[10]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[11]  Ippokratis Pandis,et al.  Scalability of write-ahead logging on multicore and multisocket hardware , 2012, The VLDB Journal.

[12]  Subramanian Arumugam,et al.  The DataPath system: a data-centric analytic processing engine for large data warehouses , 2010, SIGMOD Conference.

[13]  Antony Rowstron,et al.  Nobody ever got fired for using Hadoop on a cluster , 2012, HotCDP '12.

[14]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[15]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[16]  Jignesh M. Patel,et al.  Design and evaluation of main memory hash join algorithms for multi-core CPUs , 2011, SIGMOD '11.

[17]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[18]  Ippokratis Pandis,et al.  OLTP on Hardware Islands , 2012, Proc. VLDB Endow..

[19]  Gustavo Alonso,et al.  Database engines on multicores, why parallelize when you can distribute? , 2011, EuroSys '11.

[20]  Gustavo Alonso,et al.  Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[21]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..