Maximum Value Matters: Finding Hot Topics in Scholarly Fields

Finding hot topics in scholarly fields can help researchers to keep up with the latest concepts, trends, and inventions in their field of interest. Due to the rarity of complete large-scale scholarly data, earlier studies target this problem based on manual topic extraction from a limited number of domains, with their focus solely on a single feature such as coauthorship, citation relations, and etc. Given the compromised effectiveness of such predictions, in this paper we use a real scholarly dataset from Microsoft Academic Graph, which provides more than 12000 topics in the field of Computer Science (CS), including 1200 venues, 14.4 million authors, 30 million papers and their citation relations over the period of 1950 till now. Aiming to find the topics that will trend in CS area, we innovatively formalize a hot topic prediction problem where, with joint consideration of both inter- and intra-topical influence, 17 different scientific features are extracted for comprehensive description of topic status. By leveraging all those 17 features, we observe good accuracy of topic scale forecasting after 5 and 10 years with R2 values of 0.9893 and 0.9646, respectively. Interestingly, our prediction suggests that the maximum value matters in finding hot topics in scholarly fields, primarily from three aspects: (1) the maximum value of each factor, such as authors' maximum h-index and largest citation number, provides three times the amount of information than the average value in prediction; (2) the mutual influence between the most correlated topics serve as the most telling factor in long-term topic trend prediction, interpreting that those currently exhibiting the maximum growth rates will drive the correlated topics to be hot in the future; (3) we predict in the next 5 years the top 100 fastest growing (maximum growth rate) topics that will potentially get the major attention in CS area.

[1]  A. HéctorF.Gómez,et al.  A methodology for identifying attributes of academic excellence based on a 20/80 Pareto distribution , 2016, 2016 IEEE Global Engineering Education Conference (EDUCON).

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Sung-Hyon Myaeng,et al.  Discovering Dedicators with Topic-Based Semantic Social Networks , 2013, ICWSM.

[4]  Tieniu Tan,et al.  Social-Relational Topic Model for Social Networks , 2015, CIKM.

[5]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[6]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[7]  Changsheng Li,et al.  On Modeling and Predicting Individual Paper Citation Count over Time , 2016, IJCAI.

[8]  Juan-Zi Li,et al.  Expert Finding in a Social Network , 2007, DASFAA.

[9]  Enhong Chen,et al.  Tracking the Evolution of Social Emotions: A Time-Aware Topic Modeling Perspective , 2014, 2014 IEEE International Conference on Data Mining.

[10]  Albert-László Barabási,et al.  Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes , 2014, AAAI.

[11]  Vincent Larivière,et al.  Modeling a century of citation distributions , 2008, J. Informetrics.

[12]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[13]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[14]  Jianjun Yu,et al.  Towards Topic Trend Prediction on a Topic Evolution Model with Social Connection , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[15]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[16]  Jiawei Han,et al.  The Joint Inference of Topic Diffusion and Evolution in Social Communities , 2011, 2011 IEEE 11th International Conference on Data Mining.

[17]  Nitesh V. Chawla,et al.  Can Scientific Impact Be Predicted? , 2016, IEEE Transactions on Big Data.

[18]  Ankur Agarwal,et al.  Topic discovery and future trend forecasting for texts , 2016, Journal of Big Data.

[19]  Hui Xiong,et al.  Topic formation and development: a core-group evolving process , 2013, World Wide Web.

[20]  Ingo Scholtes,et al.  Predicting scientific success based on coauthorship networks , 2014, EPJ Data Science.