Language identification in web pages
暂无分享,去创建一个
[1] Peter Henrich. Language identification for the automatic grapheme-to-phoneme conversion of foreign words in a German text-to-speech system , 1989, EUROSPEECH.
[2] Sylvain Delisle,et al. Text Classification and Multilinguism: Getting at Words via N-grams of Characters , 2002 .
[3] James Mayfield,et al. Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.
[4] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .
[5] M Damashek,et al. Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.
[6] Douglas-Val Ziegler. The automatic identification of languages using linguistic recognition signals , 1992 .
[7] Pasi Tapanainen,et al. What is a word, What is a sentence? Problems of Tokenization , 1994 .
[8] Gregory B. Newby,et al. Information Space Based on HTML Structure , 2000, TREC.
[9] Javed A. Aslam,et al. An information-theoretic measure for document similarity , 2003, SIGIR.
[10] Einat Amitay,et al. Hypertext: The Importance of being Different , 1997 .
[11] Massimo Marchiori,et al. The Limits of Web Metadata, and Beyond , 1998, Comput. Networks.
[12] Dekang Lin,et al. An Information-Theoretic Definition of Similarity , 1998, ICML.
[13] Rafael Dueire Lins,et al. Automatic language identification of written texts , 2004, SAC '04.
[14] Weiyi Meng,et al. Using the Structure of HTML Documents to Improve Retrieval , 1997, USENIX Symposium on Internet Technologies and Systems.
[15] Ted E. Dunning,et al. Statistical Identification of Language , 1994 .
[16] Jon M. Kleinberg,et al. Mining the Web's Link Structure , 1999, Computer.
[17] Dan Shen,et al. Performance and Scalability of a Large-Scale N-gram Based Information Retrieval System , 2000, J. Digit. Inf..
[18] Einat Amitay,et al. Using common hypertext links to identify the best phrasal description of target web documents , 1998 .
[19] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .
[20] Penelope Sibun,et al. Language Determination: Natural Language Processing from Scanned Document Images , 1994, ANLP.
[21] David W. Conrath,et al. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.
[22] Graeme Hirst,et al. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .