Algorithms mention in full-text content of article from NLP domain: A comparative analysis between English and Chinese

Algorithms play an increasingly important role in scientific work, especially in data-driven research. Investigating the mention of algorithms in full-text paper helps us understand the use and development of algorithms in a specific domain. Current research on the mention of algorithms is limited to the academic papers in one language, which is hard to comprehensively investigate the use of algorithms. For example, in papers of Chinese conference, is the mention of algorithms consistent with it in English conference papers? In order to answer this question, this paper takes NLP as an example, and compares the mention frequency, mention location and mention time of the top10 data-mining algorithms between the papers of the famous international conference, Annual Meeting of the Association for Computational Linguistics (ACL), and the Chinese conference, China National Conference on Computational Linguistics (CCL). The results show that compared with ACL, the mention frequency of top10 data-mining algorithms in CCL is slightly lower and the mention time is slightly delayed, while the distribution of mention location is similar. This study can provide a reference for the research related to the mention, citation and evaluation of knowledge entities.

[1]  Chengzhi Zhang,et al.  Using the full-text content of academic articles to identify and evaluate algorithm entities in the domain of natural language processing , 2020, Journal of Informetrics.

[2]  Erjia Yan,et al.  Examining the usage, citation, and diffusion patterns of bibliometric mapping software: A comparative study of three tools , 2018, J. Informetrics.

[3]  Chengzhi Zhang,et al.  Using Full-Text of Research Articles to Analyze Academic Impact of Algorithms , 2018, iConference.

[4]  Kai Li,et al.  How is R cited in research outputs? Structure, impacts, and citation standard , 2017, J. Informetrics.

[5]  Mansaf Alam,et al.  A survey on scholarly data: From big data perspective , 2017, Inf. Process. Manag..

[6]  Alison Abbott,et al.  The ‘time machine’ reconstructing ancient Venice’s social networks , 2017, Nature.

[7]  Nan Jiang,et al.  Citation regression analysis of computer science publications in different ranking categories and subfields , 2017, Scientometrics.

[8]  Erjia Yan,et al.  Disciplinary differences of software use and impact in scientific literature , 2016, Scientometrics.

[9]  James Howison,et al.  Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature , 2016, J. Assoc. Inf. Sci. Technol..

[10]  Qianqian Wang,et al.  Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers , 2015, J. Informetrics.

[11]  Evaristo Jiménez-Contreras,et al.  Analyzing data citation practices using the data citation index , 2015, J. Assoc. Inf. Sci. Technol..

[12]  Christopher W. Belter,et al.  Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets , 2014, PloS one.

[13]  Min Song,et al.  Entitymetrics: Measuring the Impact of Entities , 2013, PloS one.

[14]  Ying Ding,et al.  The distribution of references across texts: Some implications for citation analysis , 2013, J. Informetrics.

[15]  Matthijs J. Warrens,et al.  Chance-corrected measures for 2 × 2 tables that coincide with weighted kappa. , 2011, The British journal of mathematical and statistical psychology.

[16]  Padraig Cunningham,et al.  Relative status of journal and conference publications in computer science , 2010, Commun. ACM.

[17]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[18]  Mike Thelwall,et al.  How important is computing technology for library and information science research , 2015 .