It's Time to Think About an Operating System for Near Data Processing Architectures

[1]  Ronald Minnich,et al.  NIX: A case for a manycore system for cloud computing , 2012, Bell Labs Technical Journal.

[2]  Jan Reineke,et al.  Ascertaining Uncertainty for Efficient Exact Cache Analysis , 2017, CAV.

[3]  David A. Patterson,et al.  Attack of the killer microseconds , 2017, Commun. ACM.

[4]  Dejan S. Milojicic,et al.  Beyond Processor-centric Operating Systems , 2015, HotOS.

[5]  Kiyoung Choi,et al.  PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[6]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[7]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[8]  Dean M. Tullsen,et al.  Execution migration in a heterogeneous-ISA chip multiprocessor , 2012, ASPLOS XVII.

[9]  C. V. Ramamoorthy,et al.  Parallel Task Execution in a Decentralized System , 1972, IEEE Transactions on Computers.

[10]  Yi-Ping You,et al.  VirtCL: a framework for OpenCL device abstraction and management , 2015, PPoPP.

[11]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[12]  Tamara Schmitz,et al.  The Rise of Serial Memory and the Future of DDR , 2014 .

[13]  Jung Ho Ahn,et al.  Near-DRAM Acceleration with Single-ISA Heterogeneous Processing in Standard Memory Modules , 2016, IEEE Micro.

[14]  Sparsh Mittal,et al.  A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors , 2016, ACM Comput. Surv..

[15]  T. Overton 1972 , 1972, Parables of Sun Light.

[16]  Tony M. Brewer,et al.  Instruction Set Innovations for the Convey HC-1 Computer , 2010, IEEE Micro.

[17]  Andrew W. Moore,et al.  NetFPGA SUME: Toward 100 Gbps as Research Commodity , 2014, IEEE Micro.

[18]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[19]  Marco Minutoli,et al.  Implementing Radix Sort on Emu 1 , 2015 .

[20]  Idit Keidar,et al.  GPUfs: Integrating a file system with GPUs , 2013, TOCS.

[21]  David A. Wood,et al.  Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[22]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[23]  Werner Retschitzegger,et al.  Logic-Based Modeling Approaches for Qualitative and Hybrid Reasoning in Dynamic Spatial Systems , 2015, ACM Comput. Surv..

[24]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[25]  S. M. García,et al.  2014: , 2020, A Party for Lazarus.

[26]  Galen C. Hunt,et al.  Helios: heterogeneous multiprocessing with satellite kernels , 2009, SOSP '09.

[27]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[28]  Gang Lu,et al.  On Horizontal Decomposition of the Operating System , 2016, ArXiv.

[29]  Mark Silberstein,et al.  GPUnet , 2014, OSDI.

[30]  Jean-Loup Baer Multiprocessing Systems , 1976, IEEE Transactions on Computers.

[31]  Brett D. Fleisch,et al.  Workplace microkernel and OS: a case study , 1998, Softw. Pract. Exp..

[32]  Binoy Ravindran,et al.  Thread Migration in a Replicated-Kernel OS , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[33]  Robert W. Brodersen,et al.  Borph: an operating system for fpga-based reconfigurable computers , 2007 .

[34]  Steven Swanson,et al.  Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.

[35]  Rolf Riesen,et al.  mOS: an architecture for extreme-scale operating systems , 2014, ROSS@ICS.

[36]  Franz Franchetti,et al.  HAMLeT Architecture for Parallel Data Reorganization in Memory , 2016, IEEE Micro.

[37]  Christoforos E. Kozyrakis,et al.  Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[38]  Binoy Ravindran,et al.  Breaking the Boundaries in Heterogeneous-ISA Datacenters , 2017, ASPLOS.

[39]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[40]  Steven Swanson,et al.  Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[41]  Anant Agarwal,et al.  Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[42]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[43]  Trent Jaeger,et al.  The SawMill multiserver approach , 2000, EW 9.

[44]  Gary J. Nutt A Parallel Processor Operating System Comparison , 1977, IEEE Transactions on Software Engineering.

[45]  Zhen Wang,et al.  K2 , 2015, False Summit.

[46]  Yang Liu,et al.  Willow: A User-Programmable SSD , 2014, OSDI.

[47]  Dejan S. Milojicic,et al.  Not Your Parents' Physical Address Space , 2015, HotOS.

[48]  Abhishek Bhattacharjee,et al.  Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2014, ASPLOS.

[49]  Yutaka Ishikawa,et al.  On the Scalability, Performance Isolation and Device Driver Transparency of the IHK/McKernel Hybrid Lightweight Kernel , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[50]  Carl Ramey,et al.  TILE-Gx100 ManyCore processor: Acceleration interfaces and architecture , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[51]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[52]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[53]  Binoy Ravindran,et al.  Popcorn: bridging the programmability gap in heterogeneous-ISA platforms , 2015, EuroSys.

[54]  David Chisnall There’s No Such Thing as a General-purpose Processor , 2014, ACM Queue.

[55]  A. Barbalace Popcorn : a replicated-kernel OS based on Linux , 2014 .

[56]  Ben Leslie GrailOS: A micro-kernel based, multi-server, multi-personality operating system , 2006 .

[57]  Tejas Karkhanis,et al.  Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..

[58]  Paolo Cappelletti,et al.  Non volatile memory evolution and revolution , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[59]  David A. Patterson,et al.  A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.