Cache What You Need to Cache

The SSD has been playing a significantly important role in caching systems due to its high performance-to-cost ratio. Since the cache space is typically much smaller than that of the backend storage by one order of magnitude or even more, write density (defined as writes per unit time and space) of the SSD cache is therefore much more intensive than that of HDD storage, which brings about tremendous challenges to the SSD’s lifetime. Meanwhile, under social network workloads, quite a lot writes to the SSD cache are unnecessary. For example, our study on Tencent’s photo caching shows that about 61% of total photos are accessed only once, whereas they are still swapped in and out of the cache. Therefore, if we can predict these kinds of photos proactively and prevent them from entering the cache, we can eliminate unnecessary SSD cache writes and improve cache space utilization. To cope with the challenge, we put forward a “one-time-access criteria” that is applied to the cache space and further propose a “one-time-access-exclusion” policy. Based on these two techniques, we design a prediction-based classifier to facilitate the policy. Unlike the state-of-the-art history-based predictions, our prediction is non-history oriented, which is challenging to achieve good prediction accuracy. To address this issue, we integrate a decision tree into the classifier, extract social-related information as classifying features, and apply cost-sensitive learning to improve classification precision. Due to these techniques, we attain a prediction accuracy greater than 80%. Experimental results show that the one-time-access-exclusion approach results in outstanding cache performance in most aspects. Take LRU, for instance: applying our approach improves the hit rate by 4.4%, decreases the cache writes by 56.8%, and cuts the average access latency by 5.5%.

[1]  Muhammad Zubair Shafiq,et al.  Characterizing caching workload of a large commercial Content Delivery Network , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[2]  Jiannong Cao,et al.  Web Access Patterns Enhancing Data Access Performance of Cooperative Caching in IMANETs , 2016, 2016 17th IEEE International Conference on Mobile Data Management (MDM).

[3]  Ramesh K. Sitaraman,et al.  Footprint Descriptors: Theory and Practice of Cache Provisioning in a Global CDN , 2017, CoNEXT.

[4]  Uri C. Weiser,et al.  Semantic locality and context-based prefetching using reinforcement learning , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[5]  Ke Zhou,et al.  Improving Cache Performance for Large-Scale Photo Stores via Heuristic Prefetching Scheme , 2019, IEEE Transactions on Parallel and Distributed Systems.

[6]  Stefanos Kaxiras,et al.  Cache replacement based on reuse-distance prediction , 2007, 2007 25th International Conference on Computer Design.

[7]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[8]  Nimrod Megiddo,et al.  ARC: A Self-Tuning, Low Overhead Replacement Cache , 2003, FAST.

[9]  Dan Feng,et al.  Improving flash-based disk cache with Lazy Adaptive Replacement , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Ke Zhou,et al.  ROP: Alleviating Refresh Overheads via Reviving the Memory System in Frozen Cycles , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[11]  Zhe Wang,et al.  Perceptron learning for reuse prediction , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[13]  Xavier Jimenez,et al.  Wear unleveling: improving NAND flash lifetime by balancing page endurance , 2014, FAST.

[14]  Kai Li,et al.  RIPQ: Advanced Photo Caching on Flash for Facebook , 2015, FAST.

[15]  Ke Zhou,et al.  FlexECC: Partially Relaxing ECC of MLC SSD for Better Cache Performance , 2014, USENIX Annual Technical Conference.

[16]  J. Spencer Love,et al.  Caching strategies to improve disk system performance , 1994, Computer.

[17]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.

[18]  Yue Yang,et al.  Write Skew and Zipf Distribution: Evidence and Implications , 2016, TOS.

[19]  Wei Wu,et al.  Optimizing NAND flash-based SSDs via retention relaxation , 2012, FAST.

[20]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[21]  Yoshihiro Yasutake,et al.  Optimizing the access performance and data freshness of distributed cache objects considering user access pattern , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[22]  Sivan Toledo,et al.  Algorithms and data structures for flash memories , 2005, CSUR.

[23]  Shenggang Wan,et al.  Extending Lifetime of SSD in Raid5 Systems through a Reliable Hierarchical Cache , 2017, 2017 International Conference on Networking, Architecture, and Storage (NAS).

[24]  Daniel A. Jiménez,et al.  Multiperspective Reuse Prediction , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Berkant Barla Cambazoglu,et al.  Improved Caching Techniques for Large-Scale Image Hosting Services , 2016, SIGIR.

[26]  Frederic T. Chong,et al.  Memory Cocktail Therapy: A General Learning-Based Framework to Optimize Dynamic Tradeoffs in NVMs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Mahesh Balakrishnan,et al.  Extending SSD Lifetimes with Disk-Based Write Caches , 2010, FAST.

[28]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[29]  Ying Liu,et al.  LEA: A Lazy Eviction Algorithm for SSD Cache in Cloud Block Storage , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[30]  Robbert van Renesse,et al.  An analysis of Facebook photo caching , 2013, SOSP.

[31]  Jure Leskovec,et al.  SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity , 2015, KDD.

[32]  Tei-Wei Kuo,et al.  PWL: A progressive wear leveling to minimize data migration overheads for NAND flash devices , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[33]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[34]  Tian Luo,et al.  CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives , 2011, FAST.

[35]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[36]  Songqing Chen,et al.  The stretched exponential distribution of internet media access patterns , 2008, PODC '08.

[37]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[38]  Alberto Del Bimbo,et al.  Image Popularity Prediction in Social Media Using Sentiment and Context Features , 2015, ACM Multimedia.

[39]  Ke Zhou,et al.  LX-SSD : Enhancing the Lifespan of NAND Flash-based Memory via Recycling Invalid Pages , 2017 .

[40]  Xubin He,et al.  Delta-FTL: improving SSD lifetime via exploiting content locality , 2012, EuroSys '12.

[41]  Yan Solihin,et al.  Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.

[42]  Sachin Katti,et al.  Flashield: a Hybrid Key-value Cache that Controls Flash Write Amplification , 2019, NSDI.

[43]  Li-Pin Chang,et al.  Stable Greedy: Adaptive Garbage Collection for Durable Page-Mapping Multichannel SSDs , 2016, TECS.

[44]  Kai Li,et al.  Popularity Prediction of Facebook Videos for Higher Quality Streaming , 2017, USENIX Annual Technical Conference.

[45]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[46]  Sachin Katti,et al.  Dynacache: Dynamic Cloud Caching , 2015, HotStorage.

[47]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[48]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[49]  Raffay Hamid,et al.  What makes an image popular? , 2014, WWW.