论文信息 - HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware

HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware

This paper proposes an efficient algorithm, HOCT, for CRF training on modern computer architectures. First, software prefetching techniques are utilized to hide cache miss latency. Second, we exploit SIMD to process data in parallel. Third, when dealing with large data sets, we let HOCT instead of operating system to manage swapping operations. Our experiments on various real data sets show that HOCT yields a fourfold speedup when the data can fit in memory, and over a 30-fold speedup when the memory requirement exceeds the physical memory.

Wei Zhang | Feng Gao | Lei Chang | Jianqing Ma | Tianyuan Chen

[1] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[2] Christopher Joseph Pal,et al. Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3] Srinivasan Parthasarathy,et al. Cache-conscious Frequent Pattern Mining on a Modern Processor , 2005, VLDB.

[4] Trevor Cohn,et al. Scaling Conditional Random Fields Using Error-Correcting Codes , 2005, ACL.

[5] Anastasia Ailamaki,et al. Improving hash join performance through prefetching , 2004, Proceedings. 20th International Conference on Data Engineering.

[6] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[7] Trevor Cohn. Efficient Inference in Large Conditional Random Fields , 2006, ECML.

[8] Wei Li,et al. Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[9] Todd C. Mowry,et al. Improving index performance through prefetching , 2001, SIGMOD '01.

[10] Pradeep Dubey,et al. Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[11] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.