Benchmarking Library Recognition in Tweets

Software developers often use social media (such as Twitter) to share programming knowledge such as new tools, sample code snippets, and tips on programming. One of the topics they talk about is the software library. The tweets may contain useful information about a library. A good understanding of this information, e.g., on the developer's views regarding a library can be beneficial to weigh the pros and cons of using the library as well as the general sentiments towards the library. However, it is not trivial to recognize whether a word actually refers to a library or other meanings. For example, a tweet mentioning the word “pandas” may refer to the Python pandas library or to the animal. In this work, we created the first benchmark dataset and investigated the task to distinguish whether a tweet refers to a programming library or something else. Recently, the pre-trained Transformer models (PTMs) have achieved great success in the fields of natural language processing and computer vision. Therefore, we extensively evaluated a broad set of modern PTMs, including both general-purpose and domain-specific ones, to solve this programming library recognition task in tweets. Experimental results show that the use of PTM can outperform the best-performing baseline methods by 5% - 12% in terms of F1-score under within-, cross-, and mixed-library settings.

[1]  Lingming Zhang,et al.  Deep just-in-time defect prediction: how far are we? , 2021, ISSTA.

[2]  Mehrdad Sabetzadeh,et al.  Using Domain-Specific Corpora for Improved Handling of Ambiguity in Requirements , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[3]  Qingkai Zeng,et al.  Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[4]  Ming Zhou,et al.  GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.

[5]  Ting Zhang,et al.  Sentiment Analysis for Software Engineering: How Far Can Pre-trained Transformer Models Go? , 2020, 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[6]  Hemank Lamba,et al.  Need for Tweet: How Open Source Developers Talk About Their GitHub Work on Twitter , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[7]  Liyan Song,et al.  An Investigation of Cross-Project Learning in Online Just-In-Time Software Defect Prediction , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[8]  Dat Quoc Nguyen,et al.  BERTweet: A pre-trained language model for English Tweets , 2020, EMNLP.

[9]  Alan Ritter,et al.  Code and Named Entity Recognition in StackOverflow , 2020, ACL.

[10]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[11]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[12]  Aditya Kanade,et al.  Learning and Evaluating Contextual Embedding of Source Code , 2019, ICML.

[13]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[14]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[15]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[16]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[17]  Christoph Treude,et al.  SIEVE: Helping developers sift wheat from chaff via cross-platform analysis , 2018, Empirical Software Engineering.

[18]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Bastin Tony Roy Savarimuthu,et al.  Formal in the Informal: A Multi-Level Analysis of Core Python Developers' Tweets , 2018, 2018 25th Australasian Software Engineering Conference (ASWEC).

[21]  Quoc V. Le,et al.  A Simple Method for Commonsense Reasoning , 2018, ArXiv.

[22]  Zhenchang Xing,et al.  APIReal: an API recognition and linking approach for online developer forums , 2018, Empirical Software Engineering.

[23]  Nicole Novielli,et al.  Sentiment Polarity Detection for Software Development , 2017, Empirical Software Engineering.

[24]  Mohamed Ibrahim,et al.  A Little Bird Told Me: Mining Tweets for Requirements and Software Evolution , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[25]  Norbert Seyff,et al.  An exploratory study of Twitter messages about software applications , 2017, Requirements Engineering.

[26]  Baowen Xu,et al.  An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems , 2017, IEEE Transactions on Software Engineering.

[27]  Norbert Seyff,et al.  A Needle in a Haystack: What Do Twitter Users Say about Software? , 2016, 2016 IEEE 24th International Requirements Engineering Conference (RE).

[28]  Chanchal Kumar Roy,et al.  RACK: Automatic API Recommendation Using Crowdsourced Knowledge , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[29]  Sunghun Kim,et al.  Crowd debugging , 2015, ESEC/SIGSOFT FSE.

[30]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  David Lo,et al.  NIRMAL: Automatic identification of software relevant tweets leveraging language model , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[32]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[33]  David Lo,et al.  SEWordSim: software-specific word similarity database , 2014, ICSE Companion.

[34]  Leif Singer,et al.  Software engineering at the speed of light: how developers stay current using twitter , 2014, ICSE.

[35]  David Lo,et al.  Automated construction of a software-specific word similarity database , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[36]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[37]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[38]  David Lo,et al.  Automatic classification of software related microblogs , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[39]  David Lo,et al.  Observatory of trends in software related microblogs , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[40]  Christoph Treude,et al.  The impact of social media on software engineering practices and tools , 2010, FoSER '10.

[41]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[42]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[43]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[44]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[45]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.