A Survey on Stylometric Text Features

Ways of individual style expression in a natural language include amongst other things stylometric features. These can be automatically detected with the use of computational linguistics methods. In this survey we systematize the recent studies devoted to extraction and application of stylometric features in solving natural language processing tasks: authorship attribution, authorship verification, style change detection, authorship profiling, and text classification by genre and sentiment. For that purpose we define stylometric feature categories that provide for the most effective solutions, discuss reasons for their successful application, touch upon the limitations of approaches based on their application, and make suggestions for future research.

[1]  Roman Kern,et al.  Efficient linear text segmentation based on information retrieval techniques , 2009, MEDES.

[2]  Lauren M. Stuart,et al.  On Identifying Authors with Style , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[3]  Diego R. Amancio,et al.  A Complex Network Approach to Stylometry , 2015, PloS one.

[4]  Mark Stevenson,et al.  Continuous N-gram Representations for Authorship Attribution , 2017, EACL.

[5]  Roman Kern,et al.  Towards Authorship Attribution for Bibliometrics using Stylometric Features , 2015, CLBib@ISSI.

[6]  Patrick Juola,et al.  Future Trends in Authorship Attribution , 2007, IFIP Int. Conf. Digital Forensics.

[7]  Benno Stein,et al.  Overview of the Author Identification Task at PAN-2018: Cross-domain Authorship Attribution and Style Change Detection , 2018, CLEF.

[8]  Jebari Chaker,et al.  A Segment-based Weighting Technique for URL-based Genre Classification of Web Pages , 2016, Polibits.

[9]  David Wright,et al.  Identifying idiolect in forensic authorship attribution: an n-gram textbite approach , 2014 .

[10]  Darnes Vilariño Ayala,et al.  Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs , 2016, Sensors.

[11]  Efthimios Gianitsos,et al.  Stylometric Classification of Ancient Greek Literary Texts by Genre , 2019, LaTeCH@NAACL-HLT.

[12]  Jean-Gabriel Ganascia,et al.  Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules , 2014, NLPCS 2014.

[13]  Jaroslaw Kwapien,et al.  Linguistic data mining with complex networks: a stylometric-oriented approach , 2018, Inf. Sci..

[14]  P. Plecháč,et al.  Versification and authorship attribution. A pilot study on Czech, German, Spanish, and English poetry , 2019, Studia Metrica et Poetica.

[15]  Sarah Jane Delany,et al.  Deep Level Lexical Features for Cross-lingual Authorship Attribution , 2016, MultiLingMine@ECIR.

[16]  Damon L. Woodard,et al.  What represents “style” in authorship attribution? , 2018, COLING.

[17]  Paolo Rosso,et al.  The Use of Orthogonal Similarity Relations in the Prediction of Authorship , 2013, CICLing.

[18]  Nektaria Potha,et al.  Intrinsic Author Verification Using Topic Modeling , 2018, SETN.

[19]  Rao Muhammad Adeel Nawab,et al.  Cross-Genre Author Profile Prediction Using Stylometry-Based Approach , 2016, CLEF.

[20]  Walter Daelemans,et al.  Explanation in Computational Stylometry , 2013, CICLing.

[21]  Tomi S. Melka,et al.  On Stylometric Features of H. Beam Piper’s Omnilingual , 2020, J. Quant. Linguistics.

[22]  Abhay Sharma,et al.  An Investigation of Supervised Learning Methods for Authorship Attribution in Short Hinglish Texts using Char & Word N-grams , 2018, ArXiv.

[23]  David Yarowsky,et al.  Stylometric Analysis of Scientific Articles , 2012, NAACL.

[24]  Pranjal Singh,et al.  A comparison of classifiers and features for authorship authentication of social networking messages , 2017, Concurr. Comput. Pract. Exp..

[25]  Santiago Segarra,et al.  Authorship Attribution Through Function Word Adjacency Networks , 2014, IEEE Transactions on Signal Processing.

[26]  Roman Kern,et al.  Authorship identification of documents with high content similarity , 2018, Scientometrics.

[27]  Efstathios Stamatatos,et al.  Authorship Attribution Using Text Distortion , 2017, EACL.

[28]  Mohsen Kahani,et al.  Evaluating the effects of textual features on authorship attribution accuracy , 2013, ICCKE 2013.

[29]  Walter Daelemans,et al.  Stylometric text analysis for Dutch-speaking adolescents with Autism Spectrum Disorder , 2018 .

[30]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[31]  Joseph Rudman,et al.  The State of Authorship Attribution Studies: Some Problems and Solutions , 1997, Comput. Humanit..

[32]  Steven Bethard,et al.  Not All Character N-grams Are Created Equal: A Study in Authorship Attribution , 2015, NAACL.

[33]  Helena Gómez-Adorno,et al.  Stylometry-based Approach for Detecting Writing Style Changes in Literary Texts , 2018, Computación y Sistemas.

[34]  Stefan Trausan-Matu,et al.  Classifying Written Texts Through Rhythmic Features , 2016, AIMSA.

[35]  Raymond J. Mooney,et al.  Leveraging Discourse Information Effectively for Authorship Attribution , 2017, IJCNLP.

[36]  Roman Kern,et al.  Towards a More Fine Grained Analysis of Scientific Authorship: Predicting the Number of Authors Using Stylometric Features , 2016, BIR@ECIR.

[37]  A V Zenkov New Statistical Method of Text Attribution , 2017 .

[38]  Elements of Stylish Teaching: Lessons from Strunk and White , 2009 .

[39]  Vittorio Murino,et al.  Conversationally-inspired stylometric features for authorship attribution in instant messaging , 2012, ACM Multimedia.

[40]  Chu-Ren Huang,et al.  Robust stylometric analysis and author attribution based on tones and rimes , 2019, Natural Language Engineering.

[41]  Isaac Woungang,et al.  Authorship verification for short messages using stylometry , 2013, 2013 International Conference on Computer, Information and Telecommunication Systems (CITS).

[42]  Roman Kern,et al.  Extending Scientific Literature Search by Including the Author's Writing Style , 2017, BIR@ECIR.

[43]  Ingrid Zukerman,et al.  Authorship Attribution with Topic Models , 2014, CL.

[44]  Raimundo Santos Moura,et al.  Using Stylometric Features for Sentiment Classification , 2015, CICLing.

[45]  John G. Breslin,et al.  Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution , 2016, ArXiv.

[46]  Yiming Yan,et al.  Surveying Stylometry Techniques and Applications , 2017, ACM Comput. Surv..