TASIS: Trend analysis system for international standards

Recently, text mining has risen as an advanced technology that analyzes meaningful trends and topics in document collections. Despite its increasing use in various research areas, there have not been previous studies using document collections of international standards. In this paper, we propose the Trend Analysis System for International Standards (TASIS), which automatically performs topic modeling and trend analysis on document collections of the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) Recommendations, based on a latent dirichlet allocation (LDA) algorithm. For providing Web services, the TASIS performs topic modeling by exploiting user-defined parameters, such as the number of topics and iterations, and the results show a list of the documents that each keyword in the topic is included in. The TASIS also describes a TreeMap with the size of the extracted topic as a graphical expression for easier understanding.

[1]  Paulo Cortez,et al.  Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation , 2015, Expert Syst. Appl..

[2]  John Elder,et al.  Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications , 2012 .

[3]  C. Lee Giles,et al.  Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation , 2009, ECIR.

[4]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Zhaohua Wu,et al.  On the trend, detrending, and variability of nonlinear and nonstationary time series , 2007, Proceedings of the National Academy of Sciences.

[7]  Sungjoo Lee,et al.  Using Patent Information for Designing New Product and Technology: Keyword Based Technology Roadmapping , 2008 .

[8]  Huan Liu,et al.  Text Analytics in Social Media , 2012, Mining Text Data.

[9]  Jae-Hoon Jang,et al.  An Efficient IP Traceback mechanism for the NGN based on IPv 6 Protocol , 2009 .

[10]  Ian Alexander,et al.  An introduction to qualitative research , 2000, Eur. J. Inf. Syst..

[11]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[13]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[14]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[15]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.