Scalable multi-access flash store for big data analytics

For many "Big Data" applications, the limiting factor in performance is often the transportation of large amount of data from hard disks to where it can be processed, i.e. DRAM. In this paper we examine an architecture for a scalable distributed flash store which aims to overcome this limitation in two ways. First, the architecture provides a high-performance, high-capacity, scalable random-access storage. It achieves high-throughput by sharing large numbers of flash chips across a low-latency, chip-to-chip backplane network managed by the flash controllers. The additional latency for remote data access via this network is negligible as compared to flash access time. Second, it permits some computation near the data via a FPGA-based programmable flash controller. The controller is located in the datapath between the storage and the host, and provides hardware acceleration for applications without any additional latency. We have constructed a small-scale prototype whose network bandwidth scales directly with the number of nodes, and where average latency for user software to access flash store is less than 70mus, including 3.5mus of network overhead.

[1]  Duncan G. Elliott,et al.  Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..

[2]  Arvind,et al.  Leveraging latency-insensitivity to ease multiple FPGA design , 2012, FPGA '12.

[3]  Kermin Fleming,et al.  ZIP-IO: Architecture for application-specific compression of Big Data , 2012, 2012 International Conference on Field-Programmable Technology.

[4]  Steven Swanson,et al.  QuickSAN: a storage area network for fast, distributed, solid state disks , 2013, ISCA.

[5]  A. Parashar,et al.  LEAP : A Virtual Platform Architecture for FPGAs , 2010 .

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Paramvir Bahl,et al.  Augmenting data center networks with multi-gigabit wireless links , 2011, SIGCOMM.

[8]  Martin Margala,et al.  An FPGA memcached appliance , 2013, FPGA '13.

[9]  GhemawatSanjay,et al.  The Google file system , 2003 .

[10]  Goetz Graefe,et al.  Query processing techniques for solid state drives , 2009, SIGMOD Conference.

[11]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[12]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[13]  Yong Dou,et al.  FPGA accelerator for protein secondary structure prediction based on the GOR algorithm , 2011, BMC Bioinformatics.

[14]  Asim Kadav,et al.  Differential RAID: rethinking RAID for SSD reliability , 2010, OPSR.

[15]  K. Clint Slatton,et al.  Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition , 2011, J. Signal Process. Syst..

[16]  Chanik Park,et al.  Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[17]  Dahlia Malkhi,et al.  CORFU: A Shared Log Design for Flash Clusters , 2012, NSDI.