Text Mining Using Data Compression Models
暂无分享,去创建一个
[1] Pat Langley,et al. Static Versus Dynamic Sampling for Data Mining , 1996, KDD.
[2] Marcus Hutter. Universal Learning Theory , 2010, Encyclopedia of Machine Learning.
[3] Sarah Jane Delany,et al. Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches , 2006, Artificial Intelligence Review.
[4] Stanley F. Chen,et al. Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.
[5] C.-C. Jay Kuo,et al. A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.
[6] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.
[7] ChengXiang Zhai,et al. Active Feedback - UIUC TREC-2003 HARD Experiments , 2003, TREC.
[8] Alexander J. Smola,et al. Online learning with kernels , 2001, IEEE Transactions on Signal Processing.
[9] David J. Harper,et al. Using compression based language models for text categorization. , 2003 .
[10] G. W. Milligan,et al. The validation of four ultrametric clustering algorithms , 1980, Pattern Recognit..
[11] Eamonn J. Keogh,et al. Towards parameter-free data mining , 2004, KDD.
[12] Marina Meila,et al. An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.
[13] Alistair Moffat,et al. Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..
[14] James Allan,et al. Topic detection and tracking: event-based information organization , 2002 .
[15] Mehmet M. Dalkilic,et al. Using Compression to Identify Classes of Inauthentic Texts , 2006, SDM.
[16] Ning Wu,et al. On Compression-Based Text Classification , 2005, ECIR.
[17] Yingying Wen,et al. A compression based algorithm for Chinese word segmentation , 2000, CL.
[18] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.
[19] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[20] D. Sculley,et al. Relaxed Online SVMs in the TREC Spam Filtering Track , 2007, TREC.
[21] Cole Trapnell,et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.
[22] Timothy J. Hazen,et al. Discriminative feature weighting using MCE training for topic identification of spoken audio recordings , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[23] Gordon V. Cormack,et al. Spam Corpus Creation for TREC , 2005, CEAS.
[24] David A. Cohn,et al. Improving generalization with active learning , 1994, Machine Learning.
[25] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[26] William John Teahan,et al. A repetition based measure for verification of text collections and for text categorization , 2003, SIGIR.
[27] Enrico Blanzieri,et al. A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.
[28] Dale Schuurmans,et al. Text Classification in Asian Languages without Word Segmentation , 2003 .
[29] Jonathan J. Oliver,et al. MDL and MML: Similarities and differences , 1994 .
[30] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.
[31] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .
[32] Eamonn J. Keogh,et al. A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.
[33] D. J. Wheeler,et al. A Block-sorting Lossless Data Compression Algorithm , 1994 .
[34] Carla E. Brodley,et al. Spam Filtering Using Inexact String Matching in Explicit Feature Space with On-Line Linear Classifiers , 2006, TREC.
[35] Richard K. Belew,et al. Lexical dynamics and conceptual change: Analyses and implications for information retrieval , 2003 .
[36] D. Sculley,et al. Relaxed online SVMs for spam filtering , 2007, SIGIR.
[37] Jorma Rissanen,et al. An MDL Framework for Data Clustering , 2005 .
[38] Honglak Lee,et al. Spam Deobfuscation using a Hidden Markov Model , 2005, CEAS.
[39] Thorsten Joachims,et al. Making large-scale support vector machine learning practical , 1999 .
[40] Susan T. Dumais,et al. Newsjunkie: providing personalized newsfeeds via analysis of information novelty , 2004, WWW '04.
[41] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[42] R. Durbin,et al. Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .
[43] José María Gómez Hidalgo,et al. Evaluating cost-sensitive Unsolicited Bulk Email categorization , 2002, SAC '02.
[44] Thomas Reinartz,et al. A Unifying View on Instance Selection , 2002, Data Mining and Knowledge Discovery.
[45] Ming Li,et al. Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..
[46] Karl-Michael Schneider,et al. A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.
[47] Georgios Paliouras,et al. A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2004, Information Retrieval.
[48] David L. Dowe,et al. Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..
[49] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[50] Carla E. Brodley,et al. Advances in online learning-based spam filtering , 2008 .
[51] George Forman,et al. Learning from Little: Comparison of Classifiers Given Little Training , 2004, PKDD.
[52] Blaz Zupan,et al. Towards Practical PPM Spam Filtering: Experiments for the TREC 2006 Spam Track , 2006, TREC.
[53] Ray J. Solomonoff,et al. Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.
[54] Thomas Gärtner,et al. WBCsvm: Weighted Bayesian Classification based on Support Vector Machines , 2001, ICML.
[55] Georgios Paliouras,et al. An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.
[56] T. Cover,et al. A sandwich proof of the Shannon-McMillan-Breiman theorem , 1988 .
[57] Peter Grünwald,et al. A tutorial introduction to the minimum description length principle , 2004, ArXiv.
[58] Pavel Berkhin,et al. A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.
[59] Ian H. Witten,et al. Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.
[60] Gabriel Webster. Improving letter-to-pronunciation accuracy with automatic morphologically-based stress prediction , 2004, INTERSPEECH.
[61] Roland Kuhn,et al. Automatic methods for lexical stress assignment and syllabification , 2000, INTERSPEECH.
[62] Alan W. Black,et al. Issues in building general letter to sound rules , 1998, SSW.
[63] Ran El-Yaniv,et al. Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..
[64] Hermann Ney,et al. On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..
[65] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..
[66] Cor J. Veenman,et al. Forensic Authorship Attribution Using Compression Distances to Prototypes , 2009, IWCF.
[67] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[68] Xiangde Zhang,et al. Use of the Burrows–Wheeler similarity distribution to the comparison of the proteins , 2010, Amino Acids.
[69] David A. Huffman,et al. A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.
[70] Raffaele Giancarlo,et al. Textual data compression in computational biology: a synopsis , 2009, Bioinform..
[71] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.
[72] Ting Su,et al. In search of deterministic methods for initializing K-means and Gaussian mixture clustering , 2007, Intell. Data Anal..
[73] Brockway McMillan,et al. Two inequalities implied by unique decipherability , 1956, IRE Trans. Inf. Theory.
[74] Delbert Dueck,et al. Clustering by Passing Messages Between Data Points , 2007, Science.
[75] John G. Cleary,et al. The entropy of English using PPM-based models , 1996, Proceedings of Data Compression Conference - DCC '96.
[76] Ido Dagan,et al. Mistake-Driven Learning in Text Categorization , 1997, EMNLP.
[77] Ran El-Yaniv,et al. Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..
[78] Byron Dom,et al. An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.
[79] Stan Matwin,et al. Intrinsic Plagiarism Detection using Complexity Analysis , 2009 .
[80] Y. Shtarkov,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.
[81] Francesco Romani,et al. Ranking a stream of news , 2005, WWW '05.
[82] William S. Yerazunis,et al. CRM114 versus Mr. X: CRM114 Notes for the TREC 2005 Spam Track , 2005, TREC.
[83] Johan Hovold,et al. Naive Bayes spam filtering using word-position-based attributes and length-sensitive classification thresholds , 2005, CEAS.
[84] Terry A. Welch,et al. A Technique for High-Performance Data Compression , 1984, Computer.
[85] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.
[86] J. Rissanen. Stochastic Complexity in Statistical Inquiry Theory , 1989 .
[87] Walmir M. Caminhas,et al. A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..
[88] Ran El-Yaniv,et al. On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..
[89] Jerneja Zganec-Gros,et al. Slovenian Text-to-Speech Synthesis for Speech User Interfaces , 2005, WEC.
[90] Vera Demberg,et al. Phonological Constraints and Morphological Preprocessing for Grapheme-to-Phoneme Conversion , 2007, ACL.
[91] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[92] Xin Chen,et al. A compression algorithm for DNA sequences and its applications in genome comparison , 2000, RECOMB '00.
[93] Lluís Màrquez i Villodre,et al. Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.
[94] Yimin Wu,et al. Three Non-Bayesian Methods of Spam Filtration: CRM114 at TREC 2007 , 2007, TREC.
[95] Boris Mirkin,et al. Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .
[96] B. Hayes. How many ways can you spell V1@gra? , 2007 .
[97] Gregory J. Chaitin,et al. On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.
[98] Ian H. Witten,et al. Text categorization using compression models , 2000, Proceedings DCC 2000. Data Compression Conference.
[99] Jorma Rissanen,et al. The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.
[100] Khalid Sayood,et al. Introduction to Data Compression , 1996 .
[101] Fidelis Assis. OSBF-Lua - A Text Classification Module for Lua: The Importance of the Training Method , 2006, TREC.
[102] Robert E. Schapire,et al. Predicting Nearly as Well as the Best Pruning of a Decision Tree , 1995, COLT.
[103] David L. Dowe,et al. Message Length as an Effective Ockham's Razor in Decision Tree Induction , 2001, International Conference on Artificial Intelligence and Statistics.
[104] Andrew McCallum,et al. Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.
[105] Paul M. B. Vitányi,et al. Kolmogorov Complexity and Information Theory. With an Interpretation in Terms of Questions and Answers , 2003, J. Log. Lang. Inf..
[106] Andrew McCallum,et al. Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.
[107] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.
[108] Jean-Philippe Vert,et al. The context-tree kernel for strings , 2005, Neural Networks.
[109] Teofilo F. GONZALEZ,et al. Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..
[110] D. Sculley,et al. Filtering Email Spam in the Presence of Noisy User Feedback , 2008, CEAS.
[111] H. Sebastian Seung,et al. Query by committee , 1992, COLT '92.
[112] Geoff Holmes,et al. Correcting English text using PPM models , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).
[113] Vangelis Metsis,et al. Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.
[114] Marcus Hutter,et al. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .
[115] W. Teahan. Probability estimation for PPM , 1995 .
[116] Huan Liu,et al. Data Reduction via Instance Selection , 2001 .
[117] James Allan,et al. Extracting significant time varying features from text , 1999, CIKM '99.
[118] Dragos Burileanu,et al. A statistical approach to lexical stress assignment for TTS synthesis , 2009, Int. J. Speech Technol..
[119] Mark Levene,et al. A suffix tree approach to anti-spam email filtering , 2006, Machine Learning.
[120] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.
[121] William S. Yerazunis. Seven Hypothesis about Spam Filtering , 2006, TREC.
[122] Gordon V. Cormack,et al. Email Spam Filtering: A Systematic Review , 2008, Found. Trends Inf. Retr..
[123] SˇEF TOMAZˇ,et al. DATA MINING FOR CREATING ACCENTUATION RULES , 2004 .
[124] Jerneja Zganec-Gros,et al. SI-PRON Pronunciation Lexicon: a New Language Resource for Slovenian , 2006, Informatica.
[125] Georgios Paliouras,et al. Stacking Classifiers for Anti-Spam Filtering of E-Mail , 2001, EMNLP.
[126] Leon Gordon Kraft,et al. A device for quantizing, grouping, and coding amplitude-modulated pulses , 1949 .
[127] Ian H. Witten,et al. Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..
[128] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..
[129] E. Forgy,et al. Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .
[130] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .
[131] Thomas Richard Lynam,et al. Spam Filter Improvement Through Measurement , 2009 .
[132] Konstantin Tretyakov,et al. Machine Learning Techniques in Spam Filtering , 2004 .
[133] Richard Segal,et al. IBM SpamGuru on the TREC 2005 Spam Track , 2005, TREC.
[134] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.
[135] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.
[136] Tony Andrew Meyer. A TREC Along the Spam Track with SpamBayes , 2005, TREC.
[137] Isidore Rigoutsos,et al. Chung-Kwei: a Pattern-discovery-based System for the Automatic Identification of Unsolicited E-mail Messages (SPAM) , 2004, CEAS.
[138] Charles L. A. Clarke,et al. Using dynamic markov compression to detect vandalism in the wikipedia , 2009, SIGIR.
[139] Xin Chen,et al. Shared information and program plagiarism detection , 2004, IEEE Transactions on Information Theory.
[140] Natalio Krasnogor,et al. Measuring the similarity of protein structures by means of the universal similarity metric , 2004, Bioinform..
[141] Paul G. Howard,et al. The design and analysis of efficient lossless data compression systems , 1993 .
[142] Yiming Yang,et al. Topic Detection and Tracking Pilot Study Final Report , 1998 .
[143] Marek Grochowski,et al. Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.
[144] Vittorio Loreto,et al. Language trees and zipping. , 2002, Physical review letters.
[145] Peter Weiner,et al. Linear Pattern Matching Algorithms , 1973, SWAT.
[146] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.
[147] H. Jeffreys. An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.
[148] Bogdan Filipic,et al. Exploiting structural information for semi-structured document categorization , 2006, Inf. Process. Manag..
[149] Bin Ma,et al. Chain letters & evolutionary histories. , 2003, Scientific American.
[150] Xin Chen,et al. An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..
[151] Joshua Goodman,et al. Online Discriminative Spam Filter Training , 2006, CEAS.
[152] Gordon V. Cormack,et al. Batch and Online Spam Filter Comparison , 2006, CEAS.
[153] Olivier Catoni,et al. Statistical learning theory and stochastic optimization , 2004 .
[154] Jorma Rissanen,et al. Complexity of strings in the class of Markov sources , 1986, IEEE Trans. Inf. Theory.
[155] David Evans,et al. Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .
[156] Jorma Rissanen,et al. Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.
[157] Murat Kantarcioglu,et al. Compression for Anti-Adversarial Learning , 2011, PAKDD.
[158] Luiz Eduardo Soares de Oliveira,et al. Author Identification Using Compression Models , 2022 .
[159] Nizar Bouguila,et al. A study of spam filtering using support vector machines , 2010, Artificial Intelligence Review.
[160] M M Astrahan. SPEECH ANALYSIS BY CLUSTERING, OR THE HYPERPHONEME METHOD , 1970 .
[161] Myeong-Kwan Kevin Cheon,et al. Frank and I , 2012 .
[162] Bart Goethals,et al. Automatic Vandalism Detection in Wikipedia : Towards a Machine Learning Approach , 2008 .
[163] Raffaele Giancarlo,et al. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment , 2007, BMC Bioinformatics.
[164] Georgios Paliouras,et al. Filtron: A Learning-Based Anti-Spam Filter , 2004, CEAS.
[165] Arnold W. M. Smeulders,et al. Active learning using pre-clustering , 2004, ICML.
[166] Ian H. Witten,et al. Adaptive text mining: inferring structure from sequences , 2004, J. Discrete Algorithms.
[167] Nancy Ide,et al. The MULTEXT East corpus , 1998, LREC.
[168] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[169] Chris S. Wallace,et al. The Complexity of Strict Minimum Message Length Inference , 2002, Comput. J..
[170] Gordon V. Cormack,et al. Spam and the ongoing battle for the inbox , 2007, CACM.
[171] Blaz Zupan,et al. Spam Filtering Using Statistical Data Compression Models , 2006, J. Mach. Learn. Res..
[172] Yiming Yang,et al. A study of retrospective and on-line event detection , 1998, SIGIR '98.
[173] Carla E. Brodley,et al. Compression and machine learning: a new perspective on feature space vectors , 2006, Data Compression Conference (DCC'06).
[174] Yiming Yang,et al. Topic-conditioned novelty detection , 2002, KDD.
[175] Ko Fujimura,et al. Tweet classification by data compression , 2011, DETECT '11.
[176] Tony R. Martinez,et al. Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.
[177] Jorma Rissanen,et al. Generalized Kraft Inequality and Arithmetic Coding , 1976, IBM J. Res. Dev..
[178] Gordon V. Cormack. University of Waterloo Participation in the TREC 2007 Spam Track , 2007, TREC.
[179] Gordon V. Cormack,et al. TREC 2006 Spam Track Overview , 2006, TREC.
[180] Hermann Ney,et al. Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..
[181] R. Solomonoff. A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE , 2001 .
[182] W. Krauth,et al. Learning algorithms with optimal stability in neural networks , 1987 .
[183] Shyhtsun Felix Wu,et al. On Attacking Statistical Spam Filters , 2004, CEAS.
[184] Li Wei,et al. Compression-based data mining of sequential data , 2007, Data Mining and Knowledge Discovery.
[185] Ming Li,et al. Inductive Reasoning and Kolmogorov Complexity , 1992, J. Comput. Syst. Sci..
[186] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.
[187] Gunnar Rätsch,et al. A New Discriminative Kernel from Probabilistic Models , 2001, Neural Computation.
[188] Xiaowei Xu,et al. Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.
[189] Yan Zhou,et al. A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters , 2008, J. Mach. Learn. Res..
[190] Ian H. Witten,et al. Arithmetic coding for data compression , 1987, CACM.
[191] Gary Robinson,et al. A statistical approach to the spam problem , 2003 .
[192] Greg Schohn,et al. Less is More: Active Learning with Support Vector Machines , 2000, ICML.
[193] Dmitry A. Shkarin,et al. PPM: one step to practicality , 2002, Proceedings DCC 2002. Data Compression Conference.
[194] Laurence A. F. Park. Bootstrap confidence intervals for Mean Average Precision , 2011 .
[195] Jorma Rissanen,et al. Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.
[196] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[197] Frans M. J. Willems,et al. The Context-Tree Weighting Method : Extensions , 1998, IEEE Trans. Inf. Theory.
[198] Yan Zhou,et al. Malware detection using adaptive data compression , 2008, AISec '08.
[199] Robert L. Mercer,et al. An Estimate of an Upper Bound for the Entropy of English , 1992, CL.
[200] Abraham Lempel,et al. Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.
[201] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .
[202] Seunghak Lee,et al. Dynamically Weighted Hidden Markov Model for Spam Deobfuscation , 2007, IJCAI.
[203] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .
[204] Peter Grünwald,et al. Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .
[205] L. A. Breyer. DBACL at the TREC 2005 , 2005, TREC.
[206] Pedro M. Domingos. The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.
[207] Gordon V. Cormack,et al. Online supervised spam filter evaluation , 2007, TOIS.
[208] Frédéric Bimbot,et al. Variable-length sequence matching for phonetic transcription using joint multigrams , 1995, EUROSPEECH.
[209] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.
[210] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..
[211] Michael J. Brusco,et al. Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques , 2007, J. Classif..
[212] Junyu Niu,et al. WIM at TREC 2007 , 2007, TREC.
[213] Daniel Boley,et al. Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.
[214] Ryan Thomas,et al. Grapheme to phoneme conversion and dictionary verification using graphonemes , 2003, INTERSPEECH.
[215] Maja Skrjanc,et al. Automatic Lexical Stress Assignment of Unknown Words for Highly Inflected Slovenian Language , 2002, TSD.
[216] Bin Ma,et al. The similarity metric , 2001, IEEE Transactions on Information Theory.
[217] Paul Taylor,et al. Hidden Markov models for grapheme to phoneme conversion , 2005, INTERSPEECH.
[218] James F. Allen,et al. Bi-directional conversion between graphemes and phonemes using a joint N-gram model , 2001, SSW.
[219] Zaher Dawy,et al. Implementing the context tree weighting method for content recognition , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.
[220] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .
[221] Dale Schuurmans,et al. Augmenting Naive Bayes Classifiers with Statistical Language Models , 2004, Information Retrieval.
[222] Brigham Anderson,et al. Active learning for Hidden Markov Models: objective functions and algorithms , 2005, ICML.
[223] Péter Gács,et al. Information Distance , 1998, IEEE Trans. Inf. Theory.
[224] R. Nigel Horspool,et al. Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..
[225] Georgios Paliouras,et al. Learning to Filter Unsolicited Commercial E-Mail , 2006 .
[226] William John Teahan,et al. Text classification and segmentation using minimum cross-entropy , 2000, RIAO.
[227] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.
[228] Hermann Ney,et al. Investigations on joint-multigram models for grapheme-to-phoneme conversion , 2002, INTERSPEECH.
[229] Dragomir R. Radev,et al. NewsInEssence: summarizing online news topics , 2005, Commun. ACM.
[230] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[231] Ian H. Witten,et al. Text mining: a new frontier for lossless compression , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).
[232] Matjaz Gams,et al. Analysis of Automatic Stress Assignment in Slovene , 2009, Informatica.
[233] Simson L. Garfinkel,et al. Stopping Spam , 1998 .
[234] John G. Cleary,et al. Unbounded Length Contexts for PPM , 1997 .
[235] A. Bratko,et al. Comparison between Humans and Machines on the Task of Accentuation of Slovene Words , 2005 .
[236] Man Lan,et al. Initialization of cluster refinement algorithms: a review and comparative study , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).
[237] Gordon V. Cormack,et al. Statistical precision of information retrieval evaluation , 2006, SIGIR.
[238] Burr Settles,et al. Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[239] Bogdan Filipic,et al. Spam Filtering Using Character-Level Markov Models: Experiments for the TREC 2005 Spam Track , 2005, TREC.
[240] Grzegorz Kondrak,et al. A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion , 2009, ACL/IJCNLP.
[241] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.
[242] David D. Lewis,et al. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.
[243] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .