Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights
暂无分享,去创建一个
Wenguang Chen | Onur Mutlu | Jidong Zhai | Feng Zhang | Xipeng Shen | O. Mutlu | Wenguang Chen | Xipeng Shen | Jidong Zhai | Feng Zhang
[1] Fabio Petroni,et al. HDRF: Stream-Based Partitioning for Power-Law Graphs , 2015, CIKM.
[2] D. J. Wheeler,et al. A Block-sorting Lossless Data Compression Algorithm , 1994 .
[3] Charles Elkan,et al. The Field Matching Problem: Algorithms and Applications , 1996, KDD.
[4] Craig G. Nevill-Manning,et al. Inferring Sequential Structure , 1996 .
[5] Jan O. Pedersen,et al. Optimization for dynamic inverted index maintenance , 1989, SIGIR '90.
[6] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[7] Ion Stoica,et al. Succinct: Enabling Queries on Compressed Data , 2015, NSDI.
[8] Gonzalo Navarro,et al. Compact Data Structures - A Practical Approach , 2016 .
[9] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.
[10] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[11] Ian H. Witten,et al. Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..
[12] Petros Efstathopoulos,et al. Building a High-performance Deduplication System , 2011, USENIX Annual Technical Conference.
[13] Onur Mutlu,et al. Potential of A Method for Text Analytics Directly on Compressed Data , 2017 .
[14] Sheeva Afshan,et al. Using compression algorithms to support the comprehension of program traces , 2010, WODA '10.
[15] Samuel Madden,et al. Processing Analytical Queries over Encrypted Data , 2013, Proc. VLDB Endow..
[16] Joseph M. Hellerstein,et al. Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.
[17] Hao Tang,et al. Provenance graph query method based on double layer index structure , 2017 .
[18] Reynold Xin,et al. GraphX: a resilient distributed graph system on Spark , 2013, GRADES.
[19] Seyong Lee,et al. PUMA: Purdue MapReduce Benchmarks Suite , 2012 .
[20] Ian H. Witten,et al. Linear-time, incremental hierarchy inference for compression , 1997, Proceedings DCC '97. Data Compression Conference.
[21] Craig G. Nevill-Manning,et al. Compression and Explanation Using Hierarchical Grammars , 1997, Comput. J..
[22] J. Larus. Whole program paths , 1999, PLDI '99.
[23] Gregg Rothermel,et al. Whole program path-based dynamic impact analysis , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..
[24] Brad Calder,et al. Motivation for Variable Length Intervals and Hierarchical Phase Behavior , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[25] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.
[26] James W. Pennebaker,et al. Linguistic Inquiry and Word Count (LIWC2007) , 2007 .
[27] Joshua Evan Blumenstock,et al. Size matters: word count as a measure of quality on wikipedia , 2008, WWW.
[28] Meng He,et al. Indexing Compressed Text , 2003 .
[29] Xipeng Shen,et al. Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction , 2017, PLDI.
[30] Uri Zernik,et al. Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .
[31] Martin Hirzel,et al. Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.
[32] Rodrigo González,et al. Compressed text indexes: From theory to practice , 2007, JEAL.
[33] Abraham Silberschatz,et al. Operating System Concepts Essentials , 2010 .
[34] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[35] Jie Huang,et al. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).
[36] Trishul M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality , 2001, PLDI '01.
[37] Claudio Martella,et al. Practical Graph Analytics with Apache Giraph , 2015, Apress.
[38] 황규영,et al. Inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems , 2002 .
[39] Quanzhong Li,et al. Supporting efficient query processing on compressed XML files , 2005, SAC '05.
[40] David J. DeWitt,et al. On supporting containment queries in relational database management systems , 2001, SIGMOD '01.
[41] Ludovic Lebart. Classification problems in text analysis and information retrieval , 1998 .
[42] Torsten Suel,et al. Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.
[43] Erhard Rahm,et al. Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..
[44] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.
[45] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[46] Wenguang Chen,et al. Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data , 2018, ICS.
[47] Seif Haridi,et al. Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..
[48] Roberto Grossi,et al. When indexing equals compression: experiments with compressing suffix arrays and applications , 2004, SODA '04.
[49] Bradford Nichols,et al. Pthreads programming - a POSIX standard for better multiprocessing , 1996 .