ALPACA: Advanced Linguistic Pattern and Concept Analysis Framework for Software Engineering Corpora

Software engineering corpora often contain domain-specific topics and linguistic patterns. Popular text analysis tools are not specially designed to accommodate such topics and patterns. In this paper, we introduce ALPACA, a novel, customizable text analysis framework. The main function of ALPACA is to analyze topics and their trends in a text corpus. It allows users to define a topic with a few initial domain-specific keywords and expand it into a much larger set of similar topic words. This new set of words can be further expanded into a set of self-contained phrases to describe the topic more precisely. ALPACA extracts those phrases by matching input sentences with linguistic patterns, which are long sequences mixing both specific words and part-of-speech tags frequently appeared in the corpus. In this paper, we demonstrate using ALPACA to continue analyzing CVE security reports and detect a new topic of mobile device's vulnerability. Youtube link: https://wwwyoutube.com/watch?v=UTcMYb2o1pU

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Tung Thanh Nguyen,et al.  Mining User Opinions in Mobile App Reviews: A Keyword-Based Approach (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[7]  Tung Thanh Nguyen,et al.  Tool Support for Analyzing Mobile App Reviews , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[9]  Ning Chen,et al.  AR-miner: mining informative reviews for developers from mobile app marketplace , 2014, ICSE.

[10]  Thomas Zimmermann,et al.  Security Trend Analysis with CVE Topic Models , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[11]  Tung Thanh Nguyen,et al.  Phrase-based extraction of user opinions in mobile app reviews , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).