Mobile App Review Labeling Using LDA Similarity and Term Frequency-Inverse Cluster Frequency (TF-ICF)

User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.

[1]  Ayu Purwarianti,et al.  Combination of Latent Dirichlet Allocation (LDA) and Term Frequency-Inverse Cluster Frequency (TFxICF) in Indonesian text clustering with labeling , 2016, 2016 4th International Conference on Information and Communication Technology (ICoICT).

[2]  Burairah Hussin,et al.  Opinion Mining of Movie Review using Hybrid Method of Support Vector Machine and Particle Swarm Optimization , 2013 .

[3]  Ning Chen,et al.  AR-miner: mining informative reviews for developers from mobile app marketplace , 2014, ICSE.

[4]  Walid Maalej,et al.  Bug report, feature request, or simply praise? On automatically classifying app reviews , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[5]  Ali R. Hurson,et al.  TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[6]  SangKeun Lee,et al.  Joint multi-grain topic sentiment: modeling semantic aspects for online reviews , 2016, Inf. Sci..

[7]  Kristina Winbladh,et al.  Analysis of user comments: An approach for software requirements evolution , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[8]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[9]  Ying Fu,et al.  Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation , 2015, Inf. Softw. Technol..

[10]  Frédéric Béchet,et al.  Lsislif: CRF and Logistic Regression for Opinion Target Extraction and Sentiment Polarity Analysis , 2015, SemEval@NAACL-HLT.

[11]  Jen-Tzung Chien,et al.  A new topic-bridged model for transfer learning , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[13]  Hua Xu,et al.  Clustering product features for opinion mining , 2011, WSDM '11.

[14]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[15]  Xianghua Fu,et al.  Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon , 2013, Knowl. Based Syst..

[16]  Noémie Elhadad,et al.  An Unsupervised Aspect-Sentiment Model for Online Reviews , 2010, NAACL.