Complementing Machine Learning Classifiers via Dynamic Symbolic Execution: "Human vs. Bot Generated" Tweets

Recent machine learning approaches for classifying text as human-written or bot-generated rely on training sets that are large, labeled diligently, and representative of the underlying domain. While valuable, these machine learning approaches ignore programs as an additional source of such training sets. To address this problem of incomplete training sets, this paper proposes to systematically supplement existing training sets with samples inferred via program analysis. In our preliminary evaluation, training sets enriched with samples inferred via dynamic symbolic execution were able to improve machine learning classifier accuracy for simple string-generating programs.

[1]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[2]  Domagoj Babic,et al.  Sigma*: symbolic learning of input-output specifications , 2013, POPL.

[3]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.

[4]  Mohamed Ibrahim,et al.  A Little Bird Told Me: Mining Tweets for Requirements and Software Evolution , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[5]  David Lo,et al.  NIRMAL: Automatic identification of software relevant tweets leveraging language model , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[6]  Leif Singer,et al.  Software engineering at the speed of light: how developers stay current using twitter , 2014, ICSE.

[7]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[8]  Christoph Csallner,et al.  Dsc+Mock: a test case + mock class generator in support of coding against interfaces , 2010, WODA '10.

[9]  Xin Li,et al.  Symbolic execution of complex program driven by machine learning based constraint solving , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[11]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[12]  Yuriy Brun,et al.  Finding latent code errors via machine learning over program executions , 2004, Proceedings. 26th International Conference on Software Engineering.

[13]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[14]  Zvonimir Rakamaric,et al.  Symbolic Learning of Component Interfaces , 2012, SAS.

[15]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[16]  David Lo,et al.  What's hot in software engineering Twitter space? , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[17]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[18]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[19]  James M. Rehg,et al.  Active learning for automatic classification of software behavior , 2004, ISSTA '04.

[20]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[22]  Norbert Seyff,et al.  An exploratory study of Twitter messages about software applications , 2017, Requirements Engineering.

[23]  Corina S. Pasareanu,et al.  Symbolic Execution Enhanced System Testing , 2012, VSTTE.

[24]  Shourya Roy,et al.  Fast and accurate text classification via multiple linear discriminant projections , 2003, The VLDB Journal.

[25]  Vikas Sindhwani,et al.  Active Dual Supervision: Reducing the Cost of Annotating Examples and Features , 2009, HLT-NAACL 2009.

[26]  Christoph Csallner,et al.  Generating Test Cases for Programs that Are Coded against Interfaces and Annotations , 2014, ACM Trans. Softw. Eng. Methodol..

[27]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[28]  Nikolai Tillmann,et al.  Pex-White Box Test Generation for .NET , 2008, TAP.