暂无分享,去创建一个
[1] Jeffrey F. Naughton,et al. Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.
[2] Luca Trevisan,et al. Counting Distinct Elements in a Data Stream , 2002, RANDOM.
[3] Jian Zhang,et al. On the use of words and n-grams for Chinese information retrieval , 2000, IRAL '00.
[4] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.
[5] Anssi Klapuri,et al. Conventional and periodic N-grams in the transcription of drum sequences , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).
[6] N. Cercone. CNG Method with Weighted Voting , 2004 .
[7] Kamel Aouiche,et al. Unasssuming View-Size Estimation Techniques in OLAP , 2007, ArXiv.
[8] Richard M. Karp,et al. Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..
[9] Jeffrey F. Naughton,et al. Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies , 1996, VLDB.
[10] Mark Allen Weiss,et al. Data structures and algorithm analysis in Ada , 1993 .
[11] Sudipto Guha,et al. Streaming and sublinear approximation of entropy and information distances , 2005, SODA '06.
[12] Alon Orlitsky,et al. Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.
[13] George Karypis,et al. Selective Markov models for predicting Web page accesses , 2004, TOIT.
[14] Jae-Gil Lee,et al. n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure , 2005, VLDB.
[15] Christos Faloutsos,et al. Modeling Skewed Distribution Using Multifractals and the '80-20' Law , 1996, VLDB.
[16] P. Flajolet,et al. Loglog counting of large cardinalities , 2003 .
[17] Makoto Nagao,et al. A New Method of N-gram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese , 1994, COLING.
[18] Andrew Rau-Chaplin,et al. The cgmCUBE project: Optimizing parallel data cube generation for ROLAP , 2006, Distributed and Parallel Databases.
[19] David A. McAllester,et al. On the Convergence Rate of Good-Turing Estimators , 2000, COLT.
[20] Toby J. Teorey,et al. A Pareto Model for OLAP View Size Estimation , 2001, Inf. Syst. Frontiers.
[21] Min Zhang,et al. Improving Language Model Size Reduction using Better Pruning Criteria , 2002, ACL.
[22] Ronitt Rubinfeld,et al. The complexity of approximating entropy , 2002, STOC '02.
[23] Claude E. Shannon,et al. A Mathematical Theory of Communications , 1948 .
[24] Stan Matwin,et al. A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .
[25] Dan Sullivan,et al. Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales , 2001 .
[26] George Marsaglia,et al. Toward a universal random number generator , 1987 .
[27] Matteo Golfarelli,et al. On Estimating the Cardinality of Aggregate Views , 2001, DMDW.
[28] F. James. A Review of Pseudorandom Number Generators , 1990 .
[29] Emmanuel J. Yannakoudakis,et al. n-Grams and their implication to natural language understanding , 1990, Pattern Recognit..
[30] Ronitt Rubinfeld,et al. On the learnability of discrete distributions , 1994, STOC '94.
[31] Patrick Brennan,et al. A Prototype for Authorship Attribution Studies , 2006, Lit. Linguistic Comput..
[32] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.
[33] Jinho Lee,et al. On the design and evaluation of a multi-dimensional approach to information retrieval (poster session) , 2000, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[34] Gaston H. Gonnet,et al. An Analysis of the Karp-Rabin String Matching Algorithm , 1990, Inf. Process. Lett..
[35] Matteo Golfarelli,et al. Bounding the cardinality of aggregate views through domain-derived constraints , 2003, Data Knowl. Eng..
[36] Philippe Flajolet,et al. Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..
[37] Eugene W. Myers,et al. Suffix arrays: a new method for on-line string searches , 1993, SODA '90.
[38] Yorick Wilks,et al. The Virtual Corpus Approach to Deriving Ngram Statistics from Large Scale Corpora , 2002 .
[39] Qiang Yang,et al. WhatNext: a prediction system for Web requests using n-gram sequence models , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.
[40] Stefan M. Rüger,et al. Position Indexing of Adjacent and Concurrent N-Grams for Polyphonic Music Retrieval , 2003, ISMIR.
[41] R. P. Jagadeesh Chandra Bose,et al. Data mining approaches to software fault diagnosis , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).
[42] Wing-Kai Hon,et al. Breaking a Time-and-Space Barrier in Constructing Full-Text Indices , 2009, SIAM J. Comput..
[43] Robert A. Stryk. Uniform random number generator , 1976, SIML.
[44] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .
[45] Robert Giegerich,et al. Efficient implementation of lazy suffix trees , 1999, Softw. Pract. Exp..
[46] Michel Benard. Àjuste titre: a Lexicometric Approach to the Study of Titles , 1995 .
[47] Noga Alon,et al. The space complexity of approximating the frequency moments , 1996, STOC '96.
[48] Owen Kaser,et al. Analyzing Large Collections of Electronic Text Using OLAP , 2006, ArXiv.
[49] Peter Sanders,et al. Better external memory suffix array construction , 2008, JEAL.
[50] Douglas W. Oard,et al. Textual Data Mining to Support Science and Technology Management , 2000, Journal of Intelligent Information Systems.
[51] Larry Carter,et al. Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..
[52] Timo Niemi,et al. Multidimensional Data Model and Query Language for Informetrics , 2003, J. Assoc. Inf. Sci. Technol..
[53] Hamid Pirahesh,et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.
[54] Philippe Flajolet,et al. Loglog Counting of Large Cardinalities (Extended Abstract) , 2003, ESA.
[55] Amita Goyal Chin,et al. Text databases & document management: theory & practice , 2001 .
[56] Aravind Srinivasan,et al. Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.
[57] Kim-Hung Li,et al. Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n))) , 1994, TOMS.
[58] Paul R. Cohen,et al. Unsupervised segmentation of categorical time series into episodes , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[59] Michael Droettboom. Correcting broken characters in the recognition of historical printed documents , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..
[60] Kyu-Young Whang,et al. A linear-time probabilistic counting algorithm for database applications , 1990, TODS.
[61] Xiaohui Yu,et al. Towards estimating the number of distinct value combinations for a set of attributes , 2005, CIKM '05.
[62] Srikanta Tirthapura,et al. Estimating simple functions on the union of data streams , 2001, SPAA '01.
[63] Panos M. Pardalos,et al. Handbook of Massive Data Sets , 2002, Massive Computing.
[64] Bernard Dousset,et al. DocCube: Multi-dimensional visualisation and exploration of large document sets , 2003, J. Assoc. Inf. Sci. Technol..
[65] Owen Kaser,et al. The LitOLAP Project: Data Warehousing with Literature , 2006 .
[66] Takuji Nishimura,et al. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.
[67] Jonathan D. Cohen,et al. Recursive hashing functions for n-grams , 1997, TOIS.
[68] Michael Kolonko,et al. Sequential reservoir sampling with a nonuniform distribution , 2006, TOMS.
[69] 위영철,et al. Data compression apparatus and method , 2007 .