Temporal Topic Modeling of Scholarly Publications for Future Trend Forecasting

The volume of scholarly articles published every year has grown exponentially over the years. With these growths in both core and interdisciplinary areas of research, analyzing interesting research trends can be helpful for new researchers and organizations geared towards collaborative work. Existing approaches used unsupervised learning methods such as clustering to group articles with similar characteristics for topic discovery, with low accuracy. Efficient and fast topic discovery models and future trend forecasters can be helpful in building intelligent applications like recommender systems for scholarly articles. In this paper, a novel approach to automatically discover topics (latent factors) from a large set of text documents using association rule mining on frequent itemsets is proposed. Temporal correlation analysis is used for finding the correlation between a set of topics, for improved prediction. To predict the popularity of a topic in the near future, time series analysis based on a set of topic vectors is performed. For experimental validation of the proposed approach, a dataset composed of 17 years worth of computer science scholarly articles, published through standard IEEE conferences was used, and the proposed approach achieved meaningful results.

[1]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[2]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  C. Granger,et al.  Improved methods of combining forecasts , 1984 .

[5]  Miguel Molina-Solana,et al.  Meta-association rules for mining interesting associations in multiple datasets , 2016, Appl. Soft Comput..

[6]  S. Kanmani,et al.  Document clustering and topic discovery based on semantic similarity in scientific literature , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[7]  Mohamed S. Kamel,et al.  Topic Discovery from Text Using Aggregation of Different Clustering Methods , 2002, Canadian Conference on AI.

[8]  Jun Ota,et al.  Intuitive Topic Discovery by Incorporating Word-Pair's Connection Into LDA , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[9]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[10]  Xiefei Zhi,et al.  A comparison of three kinds of multimodel ensemble forecast techniques based on the TIGGE data , 2012, Acta Meteorologica Sinica.

[11]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[14]  Christian Borgelt,et al.  Keeping things simple: finding frequent item sets by recursive elimination , 2005 .

[15]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[18]  Christian Borgelt,et al.  Simple Algorithms for Frequent Item Set Mining , 2010, Advances in Machine Learning II.

[19]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[20]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[21]  Conrad S. Tucker,et al.  Predicting emerging product design trend by mining publicly available customer review data , 2011 .

[22]  D. Newman,et al.  Probabilistic topic decomposition of an eighteenth-century American newspaper , 2006 .

[23]  Ratnadip Adhikari,et al.  A Model Ranking Based Selective Ensemble Approach for Time Series Forecasting , 2015 .

[24]  Peter Wiemer-Hastings,et al.  Latent semantic analysis , 2004, Annu. Rev. Inf. Sci. Technol..

[25]  Petros Maragos Morphological correlation and mean absolute error criteria , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[26]  Mohamed S. Kamel,et al.  Topic Discovery from Document Using Ant-Based Clustering Combination , 2005, APWeb.

[27]  Ying Liu,et al.  Using WordNet to Disambiguate Word Senses for Text Classification , 2007, International Conference on Computational Science.

[28]  C. Willmott ON THE VALIDATION OF MODELS , 1981 .

[29]  Rafael Berlanga Llavori,et al.  Topic discovery based on text mining techniques , 2007, Inf. Process. Manag..