The textcat Package for n-Gram Based Text Categorization in R
暂无分享,去创建一个
[1] Sung-Hyuk Cha,et al. Language Identification from Text Using N-gram Based Cumulative Frequency Addition , 2004 .
[2] Kevin P. Scannell. The Crúbadán Project: Corpus building for under-resourced languages , 2007 .
[3] Anil Kumar Singh. Study of Some Distance Measures for Language and Encoding Identification , 2006 .
[4] Yuen Ren Chao,et al. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .
[5] N. Mikelic,et al. Language Indentification: How to Distinguish Similar Languages? , 2007, 2007 29th International Conference on Information Technology Interfaces.
[6] Elisabeth Dévière,et al. Analyzing linguistic data: a practical introduction to statistics using R , 2009 .
[7] Ted E. Dunning,et al. Statistical Identification of Language , 1994 .
[8] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .
[9] Peter Henrich. Language identification for the automatic grapheme-to-phoneme conversion of foreign words in a German text-to-speech system , 1989, EUROSPEECH.
[10] Kurt Hornik,et al. Text Mining Infrastructure in R , 2008 .
[11] Laila Khreisat,et al. A machine learning approach for Arabic text classification using N-gram frequency statistics , 2009, J. Informetrics.
[12] David McKelvie,et al. Data in Your Language: the Eci Multilingual Corpus 1 , 2007 .
[13] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .
[14] Leo Egghe,et al. The Distribution of N-Grams , 2000, Scientometrics.
[15] Kenneth R. Beesley,et al. Language Identifier: A Computer Program for Automatic Natural-Language Identification of On-line Tex , 1988 .
[16] È ü ½ Ü ¾ Ü,et al. Probabilistic Language Modelling , 2002 .
[17] Philip Hanna,et al. Extending Zipf’s law to n-grams for large corpora , 2009, Artificial Intelligence Review.