Processing highly variant language using incremental model selection
暂无分享,去创建一个
[1] Mike Rosner,et al. A tagging algorithm for mixed language identification in a noisy domain , 2007, INTERSPEECH.
[2] David I. Holmes,et al. Neural network applications in stylometry: The Federalist Papers , 1996, Comput. Humanit..
[3] George M. Mohay,et al. Mining e-mail content for author identification forensics , 2001, SGMD.
[4] Glenn Fung,et al. The disputed federalist papers: SVM feature selection via concave minimization , 2003, TAPIA '03.
[5] Mathias Schulze,et al. Towards Authentic Tasks and Experiences: the Example of Parser-based Call , 2022 .
[6] Yang Liu,et al. Learning to Predict Code-Switching Points , 2008, EMNLP.
[7] Rajarathnam Chandramouli,et al. Gender identification from E-mails , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.
[8] Ronald Rosenfeld,et al. Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .
[9] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.
[10] David Palfreyman,et al. "A Funky Language for Teenzz to Use": Representing Gulf Arabic in Instant Messaging , 2006, J. Comput. Mediat. Commun..
[11] Dmitry V. Khmelev,et al. Using Literal and Grammatical Statistics for Authorship Attribution , 2001, Probl. Inf. Transm..
[12] Kim Luyckx,et al. Scalability Issues in Authorship Attribution , 2011 .
[13] Pascal Denis,et al. Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.
[14] Brendan T. O'Connor,et al. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.
[15] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.
[16] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[17] José Gabriel Pereira Lopes,et al. Longest Sorted Sequence Algorithm for Parallel Text Alignment , 2005, EUROCAST.
[18] François Yvon,et al. Detecting Fake Content with Relative Entropy Scoring , 2008, PAN.
[19] A Concurrent Validity Study of the Raygor Readability Estimate. , 1979 .
[20] Chris Taylor,et al. Error Correction for Arabic Dictionary Lookup , 2010, LREC.
[21] Mikko Kurimo,et al. Morfessor and variKN machine learning tools for speech and language technology , 2007, INTERSPEECH.
[22] Mark Warschauer,et al. Language Choice Online: Globalization and Identity in Egypt , 2006, J. Comput. Mediat. Commun..
[23] Hinrich Schütze,et al. Automatic Detection of Text Genre , 1997, ACL.
[24] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.
[25] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[26] Lluís Màrquez i Villodre,et al. SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.
[27] Mohammad Ali Yaghan,et al. Arabizi: A Contemporary Style of Arabic Slang , 2008, Design Issues.
[28] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .
[29] G. Harry McLaughlin,et al. SMOG Grading - A New Readability Formula. , 1969 .
[30] Toshikazu Ikuta,et al. On Statistical Parameter Setting , 2004 .
[31] S. Muthukrishnan,et al. Data streams: algorithms and applications , 2005, SODA '03.
[32] V. Melissa Holland,et al. Parsers in Tutors: What Are They Good For?. , 2013 .
[33] Neri Merhav,et al. A measure of relative entropy between individual sequences with application to universal classification , 1993, IEEE Trans. Inf. Theory.
[34] Ronald Wardhaugh. An introduction to sociolinguistics , 1988 .
[35] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .
[36] Jeffrey Heath,et al. Jewish and Muslim Dialects of Moroccan Arabic , 2002 .
[37] Bin Ma,et al. The similarity metric , 2001, IEEE Transactions on Information Theory.
[38] Matthias Scheutz,et al. Robust spoken instruction understanding for HRI , 2010, HRI 2010.
[39] Christer Samuelsson,et al. Grammar Specialization Through Entropy Thresholds , 1994, ACL.
[40] Paul M. B. Vitányi,et al. Clustering by compression , 2003, IEEE Transactions on Information Theory.
[41] Jimmy J. Lin,et al. Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.
[42] Mor Naaman,et al. Is it really about me?: message content in social awareness streams , 2010, CSCW '10.
[43] Peter N. Yianilos,et al. Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..
[44] Akira Kurematsu,et al. Language model selection based on the analysis of Japanese spontaneous speech on travel arrangement task , 1999, EUROSPEECH.
[45] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .
[46] Steven Abney,et al. Parsing By Chunks , 1991 .
[47] Ioannis Pitas,et al. Language identification in web documents using discrete HMMs , 2004, Pattern Recognit..
[48] Tommi Vatanen,et al. Language Identification of Short Text Segments with N-gram Models , 2010, LREC.
[49] Toshikazu Ikuta,et al. On unsupervised grammar induction from untagged corpora , 2006 .
[50] Paul Rodrigues,et al. Learning Arabic Morphology With Information Theory , 2005 .
[51] Suresh Venkatasubramanian,et al. Streaming for large scale NLP: Language Modeling , 2009, NAACL.
[52] Efstathios Stamatatos. A survey of modern authorship attribution methods , 2009 .
[53] Geoffrey Sampson,et al. A proposal for improving the measurement of parse accuracy , 2000 .
[54] Paul McNamee,et al. Language identification: a solved problem suitable for undergraduate instruction , 2005 .
[55] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.
[56] Vittorio Loreto,et al. Language trees and zipping. , 2002, Physical review letters.
[57] Sandra Kübler,et al. POS Tagging for German: how important is the Right Context? , 2008, LREC.
[58] Martin Chodorow,et al. Automated Essay Scoring for Nonnative English Speakers , 1999 .
[59] Mahmoud A. Al-Khatib,et al. Language Choice in Mobile Text Messages among Jordanian University Students , 2008 .
[60] Véronique Hoste,et al. Towards an Improved Methodology for Automated Readability Prediction , 2010, LREC.
[61] Daniel Shawcross Wilkerson,et al. Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.
[62] Jörg Kindermann,et al. Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.
[63] Johnnie F. Caver. Novel Topic Impact on Authorship Attribution , 2009 .
[64] Christiana Themistocleous. Written Cypriot Greek in online chat: Usage and attitudes. , 2007 .
[65] Sabine Buchholz,et al. Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.
[66] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[67] Sandra Kuebler,et al. A statistical method for syntactic dialectometry , 2010 .
[68] Grzegorz Kondrak,et al. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion , 2008, ACL.
[69] Mari Ostendorf,et al. Classifying Factored Genres with Part-of-Speech Histograms , 2009, HLT-NAACL.
[70] J. Chaker,et al. Genre Categorization of Web Pages , 2007 .
[71] V. Loreto,et al. Data compression and learning in time sequences analysis , 2002, cond-mat/0207321.
[72] Benjamin C. M. Fung,et al. Mining writeprints from anonymous e-mails for forensic investigation , 2010, Digit. Investig..
[73] Sholom M. Weiss,et al. Towards language independent automated learning of text categorization models , 1994, SIGIR '94.
[74] Slav Petrov,et al. A Universal Part-of-Speech Tagset , 2011, LREC.
[75] Nicola Cancedda,et al. Corpus-Based Grammar Specialization , 2000, CoNLL/LLL.
[76] Richard Dazeley,et al. Authorship Attribution for Twitter in 140 Characters or Less , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.
[77] Xin Chen,et al. Shared information and program plagiarism detection , 2004, IEEE Transactions on Information Theory.
[78] Thomas L. Griffiths,et al. The Author-Topic Model for Authors and Documents , 2004, UAI.
[79] Grzegorz Kondrak,et al. Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.
[80] M. Coleman,et al. A computer readability formula designed for machine scoring. , 1975 .
[81] Manny Rayner,et al. Fast Parsing Using Pruning and Grammar Specialization , 1996, ACL.
[82] Rong Zheng,et al. From fingerprint to writeprint , 2006, Commun. ACM.
[83] Boris Katz,et al. A Comparative Study of Language Models for Book and Author Recognition , 2005, IJCNLP.
[84] Hsinchun Chen,et al. Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.
[85] Matthias Scheutz,et al. Adding Context Information to Part Of Speech Tagging for Dialogues , 2010 .
[86] Hideki Kashioka,et al. Trigger-Pair Predictors in Parsing and Tagging , 1998, COLING-ACL.
[87] J. M. Prager. Linguini: language identification for multilingual documents , 1999 .
[88] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.
[89] Paul Rodrigues,et al. Learning Arabic morphology using statistical constraint-satisfaction models , 2007 .
[90] Khalil Sima'an,et al. Parsing with subdomain instance weighting from raw corpora , 2008, INTERSPEECH.
[91] Yaser Al-Onaizan,et al. Machine Transliteration of Names in Arabic Texts , 2002, SEMITIC@ACL.
[92] Efstathios Stamatatos,et al. Text Genre Detection Using Common Word Frequencies , 2000, COLING.
[93] Kristy Hollingshead,et al. Formalizing the Use and Characteristics of Constraints in Pipeline Systems , 2010 .