A Novel Method to Improve Hit Rate for Big Data Quick Reading

In big data mining analysis, the data records in the dataset are randomly retrieved. The distributed storage modes, such as BigTable, HBase, provide the cache policy for file blocks in retrieval operations. Since these records are scattered in different file blocks, the block cache does not have a high hit rate. To deal with the above problem, we propose an LRU-based double queue K-frequency cache method (DLK). The method presents a double queue storage structure, applying different storage and eviction rules for the data with varying access frequency (i.e., high/low access frequency). While the method divides the memory into data area and list area and adopts different data structure to reduce the time of data retrieval and data processing. The experimental results show that proposed method can reduce retrieval time by 30% with the cache mechanism. Compared with existing methods DLK can improve the hit rate by 60.1% and reduce the retrieval time by 43.5%. While applying in smaller cache capacity, our method outperforms other algorithms.

[1]  Nimrod Megiddo,et al.  Outperforming LRU with an adaptive replacement cache algorithm , 2004, Computer.

[2]  Yuanyuan Zhou,et al.  The Multi-Queue Replacement Algorithm for Second Level Buffer Caches , 2001, USENIX Annual Technical Conference, General Track.

[3]  Xindong Wu,et al.  A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[4]  Gerhard Weikum,et al.  The LRU-K page replacement algorithm for database disk buffering , 1993, SIGMOD Conference.

[5]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[6]  Dan Feng,et al.  Improving flash-based disk cache with Lazy Adaptive Replacement , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[8]  Chin-Laung Lei,et al.  Time-shift replacement algorithm for main memory performance optimization , 2018, The Journal of Supercomputing.