An N-Gram-and-Wikipedia joint approach to Natural Language Identification
暂无分享,去创建一个
[1] Marti A. Hearst. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.
[2] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .
[3] Thomas Mandl,et al. Barriers to Information Access across Languages on the Internet: Network and Language Effects , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).
[4] Fuchun Peng,et al. Unsupervised query segmentation using generative language models and wikipedia , 2008, WWW.
[5] James Mayfield,et al. Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.
[6] Paul McNamee,et al. Language identification: a solved problem suitable for undergraduate instruction , 2005 .
[7] Gilad Mishne,et al. Using Wikipedia at the TREC QA Track , 2004, TREC.
[8] Thomas Mandl,et al. Language Identification in Multi-lingual Web-Documents , 2006, NLDB.
[9] Radim Rehurek,et al. Language Identification on the Web: Extending the Dictionary Method , 2009, CICLing.
[10] Michael Elhadad,et al. Using Wikipedia Links to Construct Word Segmentation Corpora , 2008 .
[11] Viviana Mascardi,et al. Statistical Language Identification of Short Texts , 2011, ICAART.
[12] M. de Rijke,et al. Monolingual Document Retrieval for European Languages , 2004, Information Retrieval.
[13] Rada Mihalcea,et al. Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.
[14] M. de Rijke,et al. Blueprint of a Cross-Lingual Web Retrieval Collection , 2005, J. Digit. Inf. Manag..
[15] Mário J. Silva,et al. Language identification in web pages , 2005, SAC '05.