Block Access Pattern Discovery via Compressed Full Tensor Transformer

The discovery and prediction of block access patterns in hybrid storage systems is of crucial importance for effective tier management. Existing methods are usually based on heuristics and unable to handle complex patterns. This work newly introduces transformer to block access pattern prediction. We remark that block accesses in the tier management systems are aggregated temporally and spatially as multivariate time series of block access frequency, so the runtime requirements are relaxed, making complex models applicable for the deployment. Moreover, enormous and rarely accessed blocks in storage systems and the structure of traditional transformer models would result in millions of redundant parameters and make them impractical to be deployed. We incorporate Tensor-Train Decomposition (TTD) with transformer and propose the Compressed Full Tenor Transformer (CFTT), in which all linear layers in the vanilla transformer are replaced with tensor-train layers. Weights of input and output layers are shared to further reduce parameters and reuse knowledge implicitly. CFTT can significantly reduce the model size and computation cost, which is critical to save storage space and inference time. Extensive experiments are conducted on synthetic and real-world datasets. The results demonstrate that transformers achieve state-of-the-art performance stably in terms of top-k hit rates. Moreover, the proposed CFTT compresses transformers 16× to 461× and speeds up inference 5× without sacrificing performance on the whole, which facilitates its applications in tier management in hybrid storage systems.

[1]  John Shalf,et al.  TraceTracker: Hardware/software co-evaluation for large-scale I/O workload reconstruction , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[2]  Tei-Wei Kuo,et al.  Efficient identification of hot data for flash memory storage systems , 2006, TOS.

[3]  Andrzej Cichocki,et al.  Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network , 2020, ECCV.

[4]  Jiang Zhou,et al.  A Holistic Heterogeneity-Aware Data Placement Scheme for Hybrid Parallel I/O Systems , 2020, IEEE Transactions on Parallel and Distributed Systems.

[5]  Valentin Khrulkov,et al.  Tensorized Embedding Layers for Efficient Model Compression , 2019, ArXiv.

[6]  Lei Chen,et al.  Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting , 2020, AAAI.

[7]  Valentin Khrulkov,et al.  Generalized Tensor Models for Recurrent Neural Networks , 2019, ICLR.

[8]  Shujie Liu,et al.  Neural Speech Synthesis with Transformer Network , 2018, AAAI.

[9]  Satoshi Nakamura,et al.  Tensor Decomposition for Compressing Recurrent Neural Network , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[10]  Zhi-Li Zhang,et al.  DeepCache: A Deep Learning Based Framework For Content Caching , 2018, NetAI@SIGCOMM.

[11]  Heiner Litz,et al.  Learning I/O Access Patterns to Improve Prefetching in SSDs , 2020, ECML/PKDD.

[12]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[13]  Yongkun Li,et al.  A Light-Weight Hot Data Identification Scheme via Grouping-based LRU Lists , 2015, ICA3PP.

[14]  Volker Tresp,et al.  Tensor-Train Recurrent Neural Networks for Video Classification , 2017, ICML.

[15]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[16]  Norman W. Paton,et al.  Database Workload Capacity Planning using Time Series Analysis and Machine Learning , 2020, SIGMOD Conference.

[17]  David Hung-Chang Du,et al.  Hot data identification for flash-based storage systems using multiple bloom filters , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[18]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[19]  Christoforos E. Kozyrakis,et al.  Learning Memory Access Patterns , 2018, ICML.

[20]  Yisong Yue,et al.  Long-term Forecasting using Tensor-Train RNNs , 2017, ArXiv.

[21]  Roy Friedman,et al.  TinyLFU: A Highly Efficient Cache Admission Policy , 2014, PDP.

[22]  Guokun Lai,et al.  Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , 2017, SIGIR.

[23]  Ningfang Mi,et al.  FIOS: Feature Based I/O Stream Identification for Improving Endurance of Multi-Stream SSDs , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[24]  Giulio Zhou Multi-Task Learning for Storage Systems , 2019 .

[25]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[26]  Yi Tay,et al.  Efficient Transformers: A Survey , 2020, ArXiv.

[27]  Ivan Oseledets,et al.  Tensorized Transformer for Dynamical Systems Modeling , 2020, ArXiv.

[28]  Kevin Swersky,et al.  An Imitation Learning Approach for Cache Replacement , 2020, ICML.

[29]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[30]  Dharmendra S. Modha,et al.  SARC: Sequential Prefetching in Adaptive Replacement Cache , 2005, USENIX Annual Technical Conference, General Track.

[31]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[32]  Yutaka Ishikawa,et al.  Prefetching on Storage Servers through Mining Access Patterns on Blocks , 2016, IEEE Transactions on Parallel and Distributed Systems.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Hui Xiong,et al.  Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , 2020, AAAI.