Computing n-gram statistics in MapReduce
暂无分享,去创建一个
[1] Ian H. Witten,et al. Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .
[2] ChengXiang Zhai,et al. Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..
[3] Edward Y. Chang,et al. Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.
[4] Eugene W. Myers,et al. Suffix arrays: a new method for on-line string searches , 1993, SODA '90.
[5] Justin Zobel,et al. Accurate discovery of co-derivative documents via duplicate text detection , 2006, Inf. Syst..
[6] Jianfeng Gao,et al. MSRLM: a Scalable Language Modeling Toolkit , 2007 .
[7] Mohammed J. Zaki,et al. SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.
[8] Gerhard Weikum,et al. Query Relaxation for Entity-Relationship Search , 2011, ESWC.
[9] Srinivasan Parthasarathy,et al. Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .
[10] Srikanta J. Bedathur,et al. Temporal index sharding for space-time efficiency in archive search , 2011, SIGIR.
[11] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..
[12] Oliver Grau,et al. How Not to Be Seen - Inpainting Dynamic Objects in Crowded Scenes , 2011 .
[13] W. Bruce Croft,et al. Efficient indexing of repeated n-grams , 2011, WSDM '11.
[14] Nizar R. Mabroukeh,et al. A taxonomy of sequential pattern mining algorithms , 2010, CSUR.
[15] Satanjeev Banerjee,et al. The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.
[16] Tom White,et al. Hadoop: The Definitive Guide , 2009 .
[17] John F. Roddick,et al. Association mining , 2006, CSUR.
[18] Aristides Gionis,et al. Social Content Matching in MapReduce , 2011, Proc. VLDB Endow..
[19] Jiawei Han,et al. Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.
[20] Ramakrishnan Srikant,et al. Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.
[21] Mauro Cettolo,et al. IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.
[22] Gerhard Weikum,et al. A Language Modeling Approach for Temporal Information Needs , 2010, ECIR.
[23] Jian Pei,et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).
[24] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.
[25] Carsten Stoll. Optical reconstruction of detailed animatable human body models , 2009 .
[26] Jimmy J. Lin. Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce , 2009, SIGIR.
[27] Jianyong Wang,et al. Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.
[28] Hans-Peter Seidel,et al. Construction of smooth maps with mean value coordinates , 2007 .
[29] Heng Ji,et al. New Tools for Web-Scale N-grams , 2010, LREC.
[30] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.
[31] Kenneth Ward Church,et al. Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.
[32] Ravi Kumar,et al. Max-cover in map-reduce , 2010, WWW '10.
[33] Rada Mihalcea,et al. An Efficient Indexer for Large N-Gram Corpora , 2011, ACL.
[34] Björn-Olav Dozo,et al. Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .
[35] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.
[36] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .
[37] Peter Fankhauser,et al. Boilerplate detection using shallow text features , 2010, WSDM '10.
[38] William R. Hersh,et al. Managing Gigabytes—Compressing and Indexing Documents and Images (Second Edition) , 2001, Information Retrieval.
[39] Ming-Syan Chen,et al. DPSP: Distributed Progressive Sequential Pattern Mining on the Cloud , 2010, PAKDD.
[40] Martin Theobald,et al. Top-k query processing in probabilistic databases with non-materialized views , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[41] Mirek Riedewald,et al. Processing theta-joins using MapReduce , 2011, SIGMOD '11.
[42] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.
[43] Jimmy J. Lin,et al. Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.
[44] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.
[45] Valerie Guralnik,et al. Parallel tree-projection-based sequence mining algorithms , 2004, Parallel Comput..
[46] Sivan Toledo,et al. Characterizing the Performance of Flash Memory Storage Devices and Its Impact on Algorithm Design , 2008, WEA.
[47] Xiaolong Li,et al. An Overview of Microsoft Web N-gram Corpus and Applications , 2010, NAACL.
[48] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[49] Mohammed J. Zaki. Parallel Sequence Mining on Shared-Memory Machines , 1999, J. Parallel Distributed Comput..