Principled dictionary pruning for low-memory corpus compression
暂无分享,去创建一个
Justin Zobel | Anthony Wirth | Jiancong Tong | J. Zobel | A. Wirth | J. Tong | Jiancong Tong | Anthony Wirth
[1] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.
[2] James A. Storer,et al. Data compression via textual substitution , 1982, JACM.
[3] Richard M. Karp,et al. Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..
[4] Neri Merhav,et al. A measure of relative entropy between individual sequences with application to universal classification , 1993, IEEE Trans. Inf. Theory.
[5] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .
[6] Walter F. Tichy,et al. Delta algorithms: an empirical analysis , 1998, TSEM.
[7] A. Moffat,et al. Offline dictionary-based compression , 2000, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).
[8] Hugh E. Williams,et al. Compressing Integers for Fast File Access , 1999, Comput. J..
[9] Ian H. Witten,et al. Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .
[10] Ricardo A. Baeza-Yates,et al. Compression: A Key for Next-Generation Text Retrieval Systems , 2000, Computer.
[11] Hugh E. Williams,et al. General-purpose compression for efficient retrieval , 2001, J. Assoc. Inf. Sci. Technol..
[12] Nasir D. Memon,et al. Cluster-based delta compression of a collection of files , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..
[13] Hugh E. Williams,et al. A general-purpose compression scheme for large collections , 2002, TOIS.
[14] William R. Hersh,et al. Managing Gigabytes—Compressing and Indexing Documents and Images (Second Edition) , 2001, Information Retrieval.
[15] Torsten Suel,et al. Improved file synchronization techniques for maintaining large replicated collections over slow networks , 2004, Proceedings. 20th International Conference on Data Engineering.
[16] Charles L. A. Clarke,et al. Overview of the TREC 2004 Terabyte Track , 2004, TREC.
[17] Fred Douglis,et al. Redundancy Elimination Within Large Collections of Files , 2004, USENIX Annual Technical Conference, General Track.
[18] Szymon Grabowski,et al. Revisiting dictionary‐based compression , 2005, Softw. Pract. Exp..
[19] Sebastian Deorowicz,et al. Revisiting dictionary-based compression: Research Articles , 2005 .
[20] JUSTIN ZOBEL,et al. Inverted files for text search engines , 2006, CSUR.
[21] Gonzalo Navarro,et al. Compressed full-text indexes , 2007, CSUR.
[22] Gang Chen,et al. Lempel–Ziv Factorization Using Less Time & Space , 2008, Math. Comput. Sci..
[23] W. Bruce Croft,et al. Search Engines - Information Retrieval in Practice , 2009 .
[24] Charles L. A. Clarke,et al. Information Retrieval - Implementing and Evaluating Search Engines , 2010 .
[25] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[26] Justin Zobel,et al. Relative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval , 2010, SPIRE.
[27] Giovanni Manzini,et al. On compressing the textual web , 2010, WSDM '10.
[28] Ricardo Baeza-Yates,et al. Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .
[29] Justin Zobel,et al. Collection-based compression using discovered long matching strings , 2011, CIKM '11.
[30] Justin Zobel,et al. Reference Sequence Construction for Relative Compression of Genomes , 2011, SPIRE.
[31] Justin Zobel,et al. Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections , 2011, Proc. VLDB Endow..
[32] Justin Zobel,et al. Optimized Relative Lempel-Ziv Compression of Genomes , 2011, ACSC.
[33] Justin Zobel,et al. Sample selection for dictionary-based corpus compression , 2011, SIGIR '11.
[34] Justin Zobel,et al. Iterative Dictionary Construction for Compression of Large DNA Data Sets , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[35] Philip Shilane,et al. WAN-optimized replication of backup datasets using stream-informed delta compression , 2012, TOS.