Machine Learning Models

[1]  Jacques Savoy,et al.  Trump’s and Clinton’s Style and Rhetoric during the 2016 Presidential Election , 2018, Journal of Quantitative Linguistics.

[2]  Jacques Savoy,et al.  Authorship Attribution Based on Specific Vocabulary , 2012, TOIS.

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[5]  Jacques Savoy,et al.  Authorship attribution based on a probabilistic topic model , 2013, Inf. Process. Manag..

[6]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[7]  Julio Gonzalo,et al.  Propagating sentiment signals for estimating reputation polarity , 2019, Inf. Process. Manag..

[8]  Jacques Savoy,et al.  Authorship of Pauline epistles revisited , 2019, J. Assoc. Inf. Sci. Technol..

[9]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[10]  Jacques Savoy,et al.  Analysis of the Style and the Rhetoric of the American Presidents Over Two Centuries , 2017, Glottometrics.

[11]  Jacques Savoy,et al.  Is Starnone really the author behind Ferrante? , 2018, Digit. Scholarsh. Humanit..

[12]  Yair Neuman Computational Personality Analysis , 2016, Springer International Publishing.

[13]  Matthew Motta,et al.  The Dynamics and Political Implications of Anti-Intellectualism in the United States , 2017 .

[14]  S. Argamon,et al.  The “Fundamental Problem” of Authorship Attribution , 2012 .

[15]  Maciej Eder,et al.  Does size matter? Authorship attribution, small samples, big problem , 2015, Digit. Scholarsh. Humanit..

[16]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[17]  Janyce Wiebe,et al.  MPQA 3.0: An Entity/Event-Level Sentiment Corpus , 2015, NAACL.

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Patrick Juola,et al.  The Time Course of Language Change , 2003, Comput. Humanit..

[20]  D. Wolpert The Supervised Learning No-Free-Lunch Theorems , 2002 .

[21]  P. Eckert,et al.  Language and Gender: Introduction to the study of language and gender , 2013 .

[22]  Fredrik Johansson,et al.  Supervised Classification of Twitter Accounts Based on Textual Content of Tweets , 2019, CLEF.

[23]  C. Lorenzini,et al.  Divided by a common language , 1996, Nature.

[24]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[25]  Eric Backer,et al.  On musical stylometry - a pattern recognition approac , 2005, Pattern Recognit. Lett..

[26]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[27]  John Burrows,et al.  All the Way Through: Testing for Authorship in Different Frequency Strata , 2007, Lit. Linguistic Comput..

[28]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[29]  Avi Arampatzis,et al.  Language models and fusion for authorship attribution , 2019, Inf. Process. Manag..

[30]  Bernard Ycart,et al.  Alberti's letter counts , 2012, Lit. Linguistic Comput..

[31]  Cyril Labbé,et al.  A Tool for Literary Studies: Intertextual Distance and Tree Classification , 2005, Lit. Linguistic Comput..

[32]  Leonard Neidorf,et al.  Large-scale quantitative profiling of the Old English verse tradition , 2019, Nature Human Behaviour.

[33]  Joemon M. Jose,et al.  Topic detection and tracking on heterogeneous information , 2017, Journal of Intelligent Information Systems.

[34]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[35]  Ido Dagan,et al.  Feature instability as a criterion for selecting potential style markers , 2006, J. Assoc. Inf. Sci. Technol..

[36]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[37]  David L. Hoover,et al.  The microanalysis of style variation , 2017, Digit. Scholarsh. Humanit..

[38]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[39]  Benno Stein,et al.  Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter , 2018, CLEF.

[40]  Steven Benzel,et al.  A simple stylometric comparator: nifty assignment , 2015 .

[41]  Jacques Savoy,et al.  The Federalist Papers revisited: A collaborative attribution scheme , 2013, ASIST.

[42]  Manuel Cebrián,et al.  Reducing the Loss of Information through Annealing Text Distortion , 2011, IEEE Transactions on Knowledge and Data Engineering.

[43]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[44]  J. Pennebaker,et al.  The Secret Life of Pronouns , 2003, Psychological science.

[45]  Maciej Eder,et al.  Deeper Delta across genres and languages: do we really need the most frequent words? , 2011, Lit. Linguistic Comput..

[46]  Jose Nilo G. Binongo,et al.  The application of principal component analysis to stylometry , 1999 .

[47]  Caroline Coffin,et al.  Getting Started. Describing the grammar of speech and writing , 2005 .

[48]  Matthias Hagen,et al.  Overview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited , 2017, CLEF.

[49]  David I. Holmes,et al.  Neural network applications in stylometry: The Federalist Papers , 1996, Comput. Humanit..

[50]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[51]  Fazli Can,et al.  Change of Writing Style with Time , 2004, Comput. Humanit..

[52]  Maciej Eder,et al.  Do birds of a feather really flock together, or how to choose training samples for authorship attribution , 2013, Lit. Linguistic Comput..

[53]  Mike Kestemont,et al.  Collaborative authorship in the twelfth century: A stylometric study of Hildegard of Bingen and Guibert of Gembloux , 2015, Digit. Scholarsh. Humanit..

[54]  Matt Taddy,et al.  Document Classification by Inversion of Distributed Language Representations , 2015, ACL.

[55]  Fabio Crestani,et al.  Experimental IR Meets Multilinguality, Multimodality, and Interaction: 10th International Conference of the CLEF Association, CLEF 2019, Lugano, Switzerland, September 9–12, 2019, Proceedings , 2019, CLEF.

[56]  Raymond R. Panko,et al.  What we know about spreadsheet errors , 1998 .

[57]  C. Chaski Best Practices and Admissibility of Forensic Author Identification , 2013 .

[58]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[59]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[60]  J. A. Smith,et al.  Stylistic Constancy and Change Across Literary Corpora: Using Measures of Lexical Richness to Date Works , 2002, Comput. Humanit..

[61]  Stefan Schöberlein,et al.  Poe or not Poe? A stylometric analysis of Edgar Allan Poe's disputed writings , 2016, Digit. Scholarsh. Humanit..

[62]  Patrick Juola,et al.  Using the Google N-Gram corpus to measure cultural complexity , 2013, Lit. Linguistic Comput..

[63]  Kumiko Tanaka-Ishii,et al.  Computational Constancy Measures of Texts—Yule’s K and Rényi’s Entropy , 2015, CL.

[64]  Dominique Labbé,et al.  Experiments on authorship attribution by intertextual distance in english* , 2007, J. Quant. Linguistics.

[65]  José Nilo G. Binongo,et al.  Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution , 2003 .

[66]  Sophia Elizabeth De Morgan,et al.  Memoir of Augustus De Morgan: By His Wife Sophia Elizabeth De Morgan. With Selections from His Letters , 2005 .

[67]  H. Love Attributing Authorship: An Introduction , 2002 .

[68]  Olivier Ferret,et al.  Typing Relations in Distributional Thesauri , 2015 .

[69]  Claire Fautsch,et al.  Algorithmic stemmers or morphological analysisq An evaluation , 2009 .

[70]  Benno Stein,et al.  Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.

[71]  G. Caldarelli,et al.  The spreading of misinformation online , 2016, Proceedings of the National Academy of Sciences.

[72]  Murray R. Barrick,et al.  THE BIG FIVE PERSONALITY DIMENSIONS AND JOB PERFORMANCE: A META-ANALYSIS , 1991 .

[73]  Rachel Greenstadt,et al.  Detecting Hoaxes, Frauds, and Deception in Writing Style Online , 2012, 2012 IEEE Symposium on Security and Privacy.

[74]  Justin Zobel,et al.  Entropy-Based Authorship Search in Large Document Collections , 2007, ECIR.

[75]  Jacques Savoy,et al.  Lexical Analysis of US Political Speeches , 2010, J. Quant. Linguistics.

[76]  Donna K. Harman,et al.  Information Retrieval: The Early Years , 2019, Found. Trends Inf. Retr..

[77]  Maricor N. Soriano,et al.  Identifying the painter using texture features and machine learning algorithms , 2019, ICCSP.

[78]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[79]  Samuel C. Woolley,et al.  Algorithms, bots, and political communication in the US 2016 election: The challenge of automated political communication for election law and administration , 2018 .

[80]  Juhan Tuldava,et al.  The Development of Statistical Stylistics (A Survey) , 2004, J. Quant. Linguistics.

[81]  G. Giordan,et al.  Exploring the History of American Sociology Through Topic Modelling , 2018 .

[82]  Moshe Koppel,et al.  Determining if two documents are written by the same author , 2014, J. Assoc. Inf. Sci. Technol..

[83]  Gregory J. Park,et al.  Women are Warmer but No Less Assertive than Men: Gender and Language on Facebook , 2016, PloS one.

[84]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[85]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[86]  Mike Kestemont,et al.  Stylometry with R: A Package for Computational Text Analysis , 2016, R J..

[87]  Vlado Keselj,et al.  Twitter User Profiling: Bot and Gender Identification , 2019, CLEF.

[88]  Serena Villata,et al.  A Multilingual Evaluation for Online Hate Speech Detection , 2020, ACM Trans. Internet Techn..

[89]  Matilde Trevisani,et al.  Learning the evolution of disciplines from scientific literature: A functional clustering approach to normalized keyword count trajectories , 2018, Knowl. Based Syst..

[90]  Pervez Rizvi The interpretation of Zeta test results , 2019, Digit. Scholarsh. Humanit..

[91]  Beborah Kalb,et al.  State Of the Union: Presidential Rhetoric From Woodrow Wilson To George W Bush , 2006 .

[92]  Benno Stein,et al.  Intrinsic plagiarism analysis , 2011, Lang. Resour. Evaluation.

[93]  Morten Nielsen,et al.  Stylometry of paintings using hidden Markov modelling of contourlet transforms , 2013, Signal Process..

[94]  Carla E. Brodley,et al.  Compression and machine learning: a new perspective on feature space vectors , 2006, Data Compression Conference (DCC'06).

[95]  R. Ketcham The Anti-Federalist Papers and the Constitutional Convention Debates , 2003 .

[96]  Matthew L. Jockers Text Analysis with R for Students of Literature , 2014 .

[97]  Duncan J. Murdoch,et al.  A First Course in Statistical Programming with R , 2007 .

[98]  David Adger,et al.  Language Unlimited: The Science Behind Our Most Creative Power , 2019 .

[99]  Joseph Rudman,et al.  The State of Authorship Attribution Studies: Some Problems and Solutions , 1997, Comput. Humanit..

[100]  H. van Halteren,et al.  Outside the cave of shadows: using syntactic annotation to enhance authorship attribution , 1996 .

[101]  E. H. Simpson Measurement of Diversity , 1949, Nature.

[102]  Carmen Gregori-Signes,et al.  Analysing Lexical Density and Lexical Diversity in University Students’ Written Discourse , 2015 .

[103]  Cyril Labbé,et al.  Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? , 2012, Scientometrics.

[104]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[105]  Gary Smith,et al.  The 9 Pitfalls of Data Science , 2019 .

[106]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[107]  Roger Mitton,et al.  Spelling checkers, spelling correctors and the misspellings of poor spellers , 1987, Inf. Process. Manag..

[108]  Jacques Savoy,et al.  Vocabulary Growth Study: An Example with the State of the Union Addresses , 2015, J. Quant. Linguistics.

[109]  Arjuna Tuzzi,et al.  Improving Labbé’s Intertextual Distance: Testing a Revised Version on a Large Corpus of Italian Literature , 2013, J. Quant. Linguistics.

[110]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[111]  Mark Dredze,et al.  Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media , 2016, CHI.

[112]  David L. Hoover,et al.  Teasing Out Authorship and Style with T-tests and Zeta , 2010, DH.

[113]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[114]  Yiming Yan,et al.  Surveying Stylometry Techniques and Applications , 2017, ACM Comput. Surv..

[115]  Robert Matthews,et al.  Neural Computation in Stylometry I: An Application to the Works of Shakespeare and Fletcher , 1993 .

[116]  G. Savova,et al.  An Introduction to Natural Language Processing: How You Can Get More From Those Electronic Notes You Are Generating. , 2015, Pediatric emergency care.

[117]  Shlomo Argamon,et al.  Authorship attribution in the wild , 2010, Lang. Resour. Evaluation.

[118]  Moshe Koppel,et al.  Detecting pseudepigraphic texts using novel similarity measures , 2018, Digit. Scholarsh. Humanit..

[119]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[120]  Efstathios Stamatatos A survey of modern authorship attribution methods , 2009 .

[121]  Jan Rybicki,et al.  The stylistics and stylometry of collaborative translation: Woolf's Night and Day in Polish , 2013, Lit. Linguistic Comput..

[122]  Matthew L. Jockers,et al.  A comparative study of machine learning methods for authorship attribution , 2010, Lit. Linguistic Comput..

[123]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[124]  LabbéCyril,et al.  Duplicate and fake publications in the scientific literature , 2013 .

[125]  D. Paulhus,et al.  Explaining Donald Trump via communication style: Grandiosity, informality, and dynamism , 2017 .

[126]  Efstathios Stamatatos,et al.  Authorship Attribution Based on Feature Set Subspacing Ensembles , 2006, Int. J. Artif. Intell. Tools.

[127]  R. Harald Baayen,et al.  Analyzing linguistic data: a practical introduction to statistics using R, 1st Edition , 2008 .

[128]  Jacques Savoy,et al.  Evaluation of text representation schemes and distance measures for authorship linking , 2019, Digit. Scholarsh. Humanit..

[129]  J. Nye Presidential Leadership and the Creation of the American Era , 2013 .

[130]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[131]  L. P. Ribuffo,et al.  How to Win Votes and Influence Congress@@@Presidential Power and the Modern Presidents: The Politics of Leadership from Roosevelt to Reagan. , 1960 .

[132]  Maria Simi,et al.  Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization , 2000, ECDL.

[133]  Graeme Hirst,et al.  Segmenting documents by stylistic character , 2005, Natural Language Engineering.

[134]  G. Smith The AI Delusion , 2018 .

[135]  Jacques Savoy,et al.  Estimating the probability of an authorship attribution , 2016, J. Assoc. Inf. Sci. Technol..

[136]  Nektaria Potha,et al.  Improving author verification based on topic modeling , 2019, J. Assoc. Inf. Sci. Technol..

[137]  G. Taylor,et al.  The New Oxford Shakespeare: Authorship Companion , 2017 .

[138]  A. Miranda-García,et al.  The Authorship of the Disputed Federalist Papers with an Annotated Corpus , 2012 .

[139]  Mark Turner,et al.  Clear and Simple as the Truth: Writing Classic Prose , 1997 .

[140]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[141]  Robert Bosch,et al.  Separating Hyperplanes and the Authorship of the Disputed Federalist Papers , 1998 .

[142]  Matthias Hagen,et al.  Overview of the Author Obfuscation Task at PAN 2018: A New Approach to Measuring Safety , 2018, CLEF.

[143]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[144]  T C Mendenhall,et al.  THE CHARACTERISTIC CURVES OF COMPOSITION. , 1887, Science.

[145]  David J. Harper,et al.  Using compression based language models for text categorization. , 2003 .

[146]  Filippo Menczer,et al.  Detection of Promoted Social Media Campaigns , 2016, ICWSM.

[147]  Rotem Dror,et al.  Statistical Significance Testing for Natural Language Processing , 2020, Synthesis Lectures on Human Language Technologies.

[148]  Stuart Soroka,et al.  Affective News: The Automated Coding of Sentiment in Political Texts , 2012 .

[149]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[150]  Ernst Stadlober,et al.  The relation between word length and sentence length: an intra-systematic perspective in the core data structure , 2008, Glottometrics.

[151]  Naomie Salim,et al.  Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[152]  Douglas Biber,et al.  Longman Student Grammar of Spoken and Written English , 2002 .

[153]  Stefan Thomas Gries,et al.  Quantitative Corpus Linguistics with R: A Practical Introduction , 2009 .

[154]  Joseph Rudman,et al.  The Twelve Disputed 'Federalist' Papers: A Case for Collaboration , 2012, DH.

[155]  Gene Tsudik,et al.  Exploring Linkability of User Reviews , 2012, ESORICS.

[156]  Jacques Savoy,et al.  Text representation strategies: An example with the State of the union addresses , 2016, J. Assoc. Inf. Sci. Technol..

[157]  S. Pinker The Sense of Style , 2014 .

[158]  Paolo Rosso,et al.  On the use of character n-grams as the only intrinsic evidence of plagiarism , 2019, Language Resources and Evaluation.

[159]  Efstathios Stamatatos,et al.  Authorship Attribution for Social Media Forensics , 2017, IEEE Transactions on Information Forensics and Security.

[160]  David Mitchell,et al.  Type-token models: a comparative study , 2015, J. Quant. Linguistics.

[161]  Thierry Gautier,et al.  Semi-automated fact-checking of nucleotide sequence reagents in biomedical research publications: The Seek & Blastn tool , 2019, PloS one.

[162]  E. Paradis Analysis of Phylogenetics and Evolution with R , 2006 .

[163]  Benno Stein,et al.  Overview of PAN'17 - Author Identification, Author Profiling, and Author Obfuscation , 2017, CLEF.

[164]  Tim D. Grant TXT 4N6:method, consistency, and distinctiveness in the analysis of sms text messages , 2013 .

[165]  Sharath Chandra Guntuku,et al.  Detecting depression and mental illness on social media: an integrative review , 2017, Current Opinion in Behavioral Sciences.

[166]  David L. Hoover,et al.  Testing Burrows's Delta , 2004, Lit. Linguistic Comput..

[167]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[168]  Ismail Kassou,et al.  Authorship Analysis Studies: A Survey , 2014 .

[169]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[170]  Rachel Greenstadt,et al.  Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity , 2012, TSEC.

[171]  Richard M. Schwartz,et al.  Neural-Network Lexical Translation for Cross-lingual IR from Text and Speech , 2019, SIGIR.

[172]  Colleen J. Shogan,et al.  The President’s State of the Union Address: Tradition, Function, and Policy Implications , 2012 .

[173]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[174]  Mark Steyvers,et al.  Detecting authorship deception: a supervised machine learning approach using author writeprints , 2012, Lit. Linguistic Comput..

[175]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[176]  P. Bearman,et al.  Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014 , 2015, Proceedings of the National Academy of Sciences.

[177]  Sushil Jajodia,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.

[178]  Alan Hanjalic,et al.  Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors , 2019, SIGIR.

[179]  Sandro Skansi,et al.  Introduction to Deep Learning , 2018, Undergraduate Topics in Computer Science.

[180]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[181]  W. Shakespeare The Two Noble Kinsmen , 1970 .

[182]  Emery D. Berger,et al.  ExceLint: automatically finding spreadsheet formula errors , 2018, Proc. ACM Program. Lang..

[183]  Guillaume Desagulier,et al.  Corpus Linguistics and Statistics with R , 2017 .

[184]  J. Pennebaker The Secret Life of Pronouns: What Our Words Say About Us , 2011 .

[185]  Benno Stein,et al.  Overview of PAN 2019: Bots and Gender Profiling, Celebrity Profiling, Cross-Domain Authorship Attribution and Style Change Detection , 2019, CLEF.

[186]  David L. Hoover,et al.  An exercise in non-ideal authorship attribution: the mysterious Maria Ward , 2009, Lit. Linguistic Comput..

[187]  Jacques Savoy,et al.  A simple and efficient algorithm for authorship verification , 2017, J. Assoc. Inf. Sci. Technol..

[188]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[189]  P. Plecháč,et al.  Versification and authorship attribution. A pilot study on Czech, German, Spanish, and English poetry , 2019, Studia Metrica et Poetica.

[190]  Benno Stein,et al.  A Decade of Shared Tasks in Digital Text Forensics at PAN , 2019, ECIR.

[191]  H. Sichel On a Distribution Law for Word Frequencies , 1975 .

[192]  Shlomo Argamon,et al.  Automatically profiling the author of an anonymous text , 2009, CACM.

[193]  Malvina Nissim,et al.  Bleaching Text: Abstract Features for Cross-lingual Gender Prediction , 2018, ACL.

[194]  P. Seargeant The Emoji Revolution , 2019 .

[195]  W. Oliveira,et al.  Comparing compression models for authorship attribution. , 2013, Forensic science international.

[196]  Efstathios Stamatatos,et al.  Authorship Attribution Using Text Distortion , 2017, EACL.

[197]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[198]  R. Hart Trump and Us , 2020 .

[199]  Scott McCrea John Florio: The Man Who Was Shakespeare , by Lamberto Tassinari , 2010 .

[200]  Charu C. Aggarwal,et al.  Mining Text Streams , 2012, Mining Text Data.

[201]  Donna Harman,et al.  How effective is suffixing , 1991 .

[202]  A. Miranda-García,et al.  Yule’s Characteristic K Revisited , 2005 .

[203]  Peter Dalgaard,et al.  Introductory statistics with R , 2002, Statistics and computing.

[204]  Roberto Di Pietro,et al.  DNA-Inspired Online Behavioral Modeling and Its Application to Spambot Detection , 2016, IEEE Intell. Syst..

[205]  Daniel Dajun Zeng,et al.  Behavior enhanced deep bot detection in social media , 2017, 2017 IEEE International Conference on Intelligence and Security Informatics (ISI).

[206]  Petter Törnberg,et al.  Echo chambers and viral misinformation: Modeling fake news as complex contagion , 2018, PloS one.

[207]  D. Holmes A Stylometric Analysis of Mormon Scripture and Related Texts , 1992 .

[208]  Paolo Rosso,et al.  On the impact of emotions on author profiling , 2016, Inf. Process. Manag..

[209]  Lauren M. Stuart,et al.  On Identifying Authors with Style , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[210]  David Crystal ‘Think on my words’: Index , 2008 .

[211]  J. H. Lee,et al.  n-Gram-based indexing for Korean text retrieval , 1999, Inf. Process. Manag..

[212]  James C. Humes Confessions of a White House Ghostwriter: Five Presidents and Other Political Adventures , 1997 .

[213]  Benjamin C. M. Fung,et al.  Learning Stylometric Representations for Authorship Analysis , 2016, IEEE Transactions on Cybernetics.

[214]  Pervez Rizvi,et al.  An improvement to Zeta , 2018, Digit. Scholarsh. Humanit..

[215]  D. Holmes,et al.  The Federalist Revisited: New Directions in Authorship Attribution , 1995 .

[216]  Emilio Ferrara,et al.  Bots increase exposure to negative and inflammatory content in online social systems , 2018, Proceedings of the National Academy of Sciences.

[217]  David L. Hoover,et al.  Another Perspective on Vocabulary Richness , 2003, Comput. Humanit..

[218]  Cindy K. Chung,et al.  Winning words: Individual differences in linguistic style among U.S. presidential and vice presidential candidates , 2007 .

[219]  Edward Jimenez,et al.  Exploring Performance-Based Music Attributes for Stylometric Analysis , 2009 .

[220]  Hsinchun Chen,et al.  Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.

[221]  Xia Hu,et al.  Techniques for interpretable machine learning , 2018, Commun. ACM.

[222]  Efstathios Stamatatos,et al.  Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[223]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[224]  Peter Dixon,et al.  Sentence-length and Authorship Attribution: the Case of Oliver Goldsmith , 2004, Lit. Linguistic Comput..

[225]  Matthew L. Jockers Testing Authorship in the Personal Writings of Joseph Smith Using NSC Classification , 2013, Lit. Linguistic Comput..

[226]  Michele A. Cortelazzo,et al.  It Takes Many Hands to Draw Elena Ferrante’s Profile , 2018 .

[227]  Fabio Crestani,et al.  Overview of eRisk: Early Risk Prediction on the Internet (Extended Lab Overview) , 2018, CLEF.

[228]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[229]  Victoria Johansson,et al.  Lexical diversity and lexical density in speech and writing , 2009 .

[230]  Arjuna Tuzzi,et al.  What is Elena Ferrante? A comparative analysis of a secretive bestselling Italian writer , 2018, Digit. Scholarsh. Humanit..

[231]  Antonio Miranda García,et al.  Function Words in Authorship Attribution Studies , 2007, Lit. Linguistic Comput..

[232]  Sasa Adamovic,et al.  Automated language‐independent authorship verification (for Indo‐European languages) , 2019, J. Assoc. Inf. Sci. Technol..

[233]  William V. Ridings,et al.  Rating the Presidents: A Ranking of U.S. Leaders, from the Great and Honorable to the Dishonest and Incompetent , 1997 .

[234]  Hans Van Halteren,et al.  Author verification by linguistic profiling: An exploration of the parameter space , 2007, TSLP.

[235]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[236]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[237]  Walter Daelemans,et al.  The effect of author set size and data size in authorship attribution , 2011, Lit. Linguistic Comput..

[238]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..

[239]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[240]  M. Mémet,et al.  Making a Point , 2019, Fake Physics: Spoofs, Hoaxes and Fictitious Science.

[241]  P. Stone,et al.  Verbal Style and the Presidency: A Computer-Based Analysis. , 1985 .

[242]  Mike Kestemont,et al.  Collaborative authorship: Conrad, Ford and Rolling Delta , 2014, Lit. Linguistic Comput..

[243]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[244]  Maciej Eder,et al.  Visualization in stylometry: Cluster analysis using networks , 2017, Digit. Scholarsh. Humanit..

[245]  Paolo Rosso,et al.  Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling in Twitter , 2019, CLEF.

[246]  J. Tulis The Rhetorical Presidency , 1987 .

[247]  Ryan L. Boyd,et al.  Language-based personality: a new approach to personality in a digital world , 2017, Current Opinion in Behavioral Sciences.

[248]  J. M. Hughes,et al.  Quantitative patterns of stylistic influence in the evolution of literature , 2012, Proceedings of the National Academy of Sciences.

[249]  R. Burchfield Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561 , 1985 .

[250]  Jacques Savoy,et al.  Analysis of the style and the rhetoric of the 2016 US presidential primaries , 2018, Digit. Scholarsh. Humanit..

[251]  Matthias Hagen,et al.  Author Obfuscation: Attacking the State of the Art in Authorship Verification , 2016, CLEF.

[252]  Julinda Stefa,et al.  Bot and Gender Detection of Twitter Accounts Using Distortion and LSA , 2019, CLEF.

[253]  Jack Elliott,et al.  Language chunking, data sparseness, and the value of a long marker list: explorations with word n-grams and authorial attribution , 2014, Lit. Linguistic Comput..

[254]  Thomas Merriam,et al.  LETTER FREQUENCY AS A DISCRIMINATOR OF AUTHORS , 1994 .

[255]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[256]  V. S. Subrahmanian,et al.  Using sentiment to detect bots on Twitter: Are humans more opinionated than bots? , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[257]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[258]  Lawrence D. Jackel,et al.  An analog neural network processor with programmable topology , 1991 .

[259]  Pankaj Rohatgi,et al.  Can Pseudonymity Really Guarantee Privacy? , 2000, USENIX Security Symposium.

[260]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[261]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[262]  SmadjaFrank Retrieving collocations from text , 1993 .

[263]  R A Hardcastle,et al.  CUSUM: a credible method for the determination of authorship? , 1997, Science & justice : journal of the Forensic Science Society.

[264]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[265]  S. Fienberg,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[266]  Jacques Savoy,et al.  Comparative evaluation of term selection functions for authorship attribution , 2015, Digit. Scholarsh. Humanit..

[267]  A. Nelson Shakespeare Beyond Doubt: The life and theatrical interests of Edward de Vere, seventeenth Earl of Oxford , 2013 .

[268]  Omid Mohamad Nezami,et al.  ShEMO - A Large-Scale Validated Database for Persian Speech Emotion Detection , 2019, ArXiv.

[269]  Fred J. Damerau,et al.  The use of function word frequencies as indicators of style , 1975 .

[270]  Efstathios Stamatatos,et al.  Plagiarism and authorship analysis: introduction to the special issue , 2011, Lang. Resour. Evaluation.

[271]  P. Dixon,et al.  Goldsmith's Periodical Essays: A Statistical Analysis of Eleven Doubtful Cases , 1993 .

[272]  Glenn Fung,et al.  The disputed federalist papers: SVM feature selection via concave minimization , 2003, TAPIA '03.

[273]  Constantina Stamou,et al.  Stylochronometry: Stylistic Development, Sequence of Composition, and Relative Dating , 2007, Lit. Linguistic Comput..

[274]  Arjuna Tuzzi,et al.  A portrait of JASA: the History of Statistics through analysis of keyword counts in an early scientific journal , 2014, Quality & Quantity.

[275]  Patrick Juola,et al.  Analyzing Stylometric Approaches to Author Obfuscation , 2011, IFIP Int. Conf. Digital Forensics.

[276]  Trevor Hastie,et al.  Computer Age Statistical Inference: Algorithms, Evidence, and Data Science , 2016 .

[277]  Cyril Labbé,et al.  How to Measure the Meanings of Words? Amour in Corneille’s Work , 2005, Lang. Resour. Evaluation.

[278]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[279]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[280]  Patrick Juola,et al.  The Rowling Case: A Proposed Standard Analytic Protocol for Authorship Questions , 2015, Digit. Scholarsh. Humanit..

[281]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[282]  John Burrows,et al.  'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship , 2002, Lit. Linguistic Comput..

[283]  Jacques Savoy,et al.  UniNE at PAN-CLEF 2019: Bots and Gender Task , 2019, CLEF.

[284]  Isabella Reger,et al.  Understanding and explaining Delta measures for authorship attribution , 2017, Digit. Scholarsh. Humanit..

[285]  Michael Gamon,et al.  Obfuscating Document Stylometry to Preserve Author Anonymity , 2006, ACL.

[286]  Walter Daelemans,et al.  Explanation in Computational Stylometry , 2013, CICLing.

[287]  R. Forsyth Stylochronometry with substrings, or : a poet young and old , 1999 .

[288]  R. Harald Baayen,et al.  Word Frequency Distributions , 2001 .

[289]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[290]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[291]  Shlomo Argamon,et al.  Interpreting Burrows's Delta: Geometric and Probabilistic Foundations , 2007, Lit. Linguistic Comput..

[292]  G. Lakoff,et al.  The Little Blue Book: The Essential Guide to Thinking and Talking Democratic , 2012 .

[293]  Inna Vogel,et al.  Authorship Verification in the Absence of Explicit Features and Thresholds , 2018, ECIR.

[294]  Helena Gómez-Adorno,et al.  Bots and Gender Profiling using Character Bigrams , 2019, CLEF.

[295]  Matthew L. Jockers,et al.  Reassessing authorship of the Book of Mormon using delta and nearest shrunken centroid classification , 2008, Lit. Linguistic Comput..

[296]  Moshe Koppel,et al.  Exploiting Stylistic Idiosyncrasies for Authorship Attribution , 2003 .

[297]  E. Lim Five Trends in Presidential Rhetoric: An Analysis of Rhetoric from George Washington to Bill Clinton , 2002 .

[298]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[299]  Moshe Koppel,et al.  Identifying Distinct Components of a Multi-author Document , 2012, 2012 European Intelligence and Security Informatics Conference.

[300]  Thomas Merriam,et al.  Shakespeare, Fletcher, and the Two Noble Kinsmen , 1994 .

[301]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[302]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[303]  Moshe Koppel,et al.  Measuring Differentiability: Unmasking Pseudonymous Authors , 2007, J. Mach. Learn. Res..

[304]  Jacques Savoy,et al.  Clustering : An Application with the State of the Union Addresses , 2015 .

[305]  N. M. Laan Stylometry and Method. The Case of Euripides , 1995 .

[306]  Jacques Savoy,et al.  Distributed language representation for authorship attribution , 2018, Digit. Scholarsh. Humanit..

[307]  David Crystal ‘Think on my words’: Frontmatter , 2008 .

[308]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009 .

[309]  Dmitry V. Khmelev,et al.  Using Literal and Grammatical Statistics for Authorship Attribution , 2001, Probl. Inf. Transm..

[310]  B. Efron,et al.  Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63 , 1976 .

[311]  David Crystal,et al.  The Cambridge Encyclopedia of Language , 2012, Modern Language Review.

[312]  Emilio Ferrara,et al.  Deep Neural Networks for Bot Detection , 2018, Inf. Sci..

[313]  J. F. Burrows,et al.  Not Unles You Ask Nicely: The Interpretative Nexus Between Analysis and Information , 1992 .

[314]  Patrick Juola,et al.  Correlations and Potential Cross-Linguistic Indicators of Writing Style , 2019, J. Quant. Linguistics.

[315]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[316]  Jacques Savoy,et al.  Distance measures in author profiling , 2017, Information Processing & Management.

[317]  Roberto Di Pietro,et al.  Social Fingerprinting: Detection of Spambot Groups Through DNA-Inspired Behavioral Modeling , 2017, IEEE Transactions on Dependable and Secure Computing.

[318]  J. Rudman Unediting, De-Editing, and Editing in Nontraditional Authorship Attribution Studies: With an Emphasis on the Canon of Daniel Defoe , 2005, The Papers of the Bibliographical Society of America.

[319]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[320]  Douglas Biber,et al.  Representativeness in corpus design , 1993 .

[321]  Christian Winter,et al.  On the Usefulness of Compression Models for Authorship Verification , 2017, ARES.

[322]  Helena Gómez-Adorno,et al.  Bots and Gender Identification Based on Stylometry of Tweet Minimal Structure and n-grams Model , 2019, CLEF.

[323]  D. Labbé Si deux et deux sont quatre, Molière n'a pas écrit Dom Juan , 2009 .

[324]  Benno Stein,et al.  Clustering by Authorship Within and Across Documents , 2016, CLEF.

[325]  Malcolm Farrow,et al.  Use of the Chi-Squared Test to Examine Vocabulary Differences in English Language Corpora Representing Seven Different Countries , 2007, Lit. Linguistic Comput..

[326]  Bradley Kjell,et al.  Authorship Determination Using Letter Pair Frequency Features with Neural Network Classifiers , 1995 .

[327]  Carol W. Gelderman All the Presidents' Words: The Bully Pulpit and the Creation of the Virtual Presidency , 1997 .

[328]  David Crystal,et al.  A Little Book of Language , 2010 .

[329]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[330]  B. Efron,et al.  Did Shakespeare write a newly-discovered poem? , 1987 .

[331]  David I. Holmes,et al.  Who Was the Author? An Introduction to Stylometry , 2003 .

[332]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[333]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[334]  Ning Wu,et al.  On Compression-Based Text Classification , 2005, ECIR.

[335]  R. Leigh,et al.  A Scientific Approach to the Shakespeare Authorship Question , 2019, SAGE Open.

[336]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[337]  D. Labbé Romain Gary et Emile Ajar , 2008 .

[338]  Patrick Juola,et al.  Psychological profiling through textual analysis , 2013, Lit. Linguistic Comput..