What do governments plan in the field of artificial intelligence?: Analysing national AI strategies using NLP

The primary goal of this paper is to explore how Natural Language Processing techniques (NLP) can assist in reviewing, understanding, and drawing conclusions from text datasets. We explore NLP techniques for the analysis and the extraction of useful information from the text of twelve national strategies on artificial intelligence (AI). For this purpose, we are using a set of machine learning algorithms in order to (a) extract the most significant keywords and summarize each strategy document, (b) discover and assign topics to each document, and (c) cluster the strategies based on their pair-wise similarity. Using the results of the analysis, we discuss the findings and highlight critical issues that emerge from the national strategies for artificial intelligence, such as the importance of the data ecosystem for the development of AI, the increasing considerations about ethical and safety issues, as well as the growing ambition of many countries to lead in the AI race. Utilizing the LDA topic model, we were able to reveal the distributions of thematic sub-topics among the strategic documents. The topic modelling distributions were then used along with other document similarity measures as an input for the clustering of the strategic documents into groups. The results revealed three clusters of countries with a visible differentiation between the strategies of China and Japan on the one hand and the Scandinavian strategies (plus the German and the Luxemburgish) one on the other. The former promote technology and innovation-driven development plans in order to integrate AI with the economy, while the latter share a common view regarding the role of the public sector both as a promoter and investor but also as a user and beneficiary of AI, and give a higher priority to the ethical & safety issues that are connected to the development of AI.

[1]  Hajo Hippner,et al.  Text Mining , 2006, Informatik-Spektrum.

[2]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Seán S. ÓhÉigeartaigh,et al.  An AI Race for Strategic Advantage: Rhetoric and Risks , 2017, AIES.

[4]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[5]  Krys J. Kochut,et al.  Text Summarization Techniques: A Brief Survey , 2017, International Journal of Advanced Computer Science and Applications.

[6]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[7]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[8]  Derek Greene,et al.  Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach , 2016, Political Analysis.

[9]  J. Farrell Corporate funding and ideological polarization about climate change , 2015, Proceedings of the National Academy of Sciences.

[10]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[11]  Kai-Fu Lee AI Superpowers: China, Silicon Valley, and the New World Order , 2018 .

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  James E. Purpura,et al.  An Active Learning Framework for Classifying Political Text , 2007 .

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Constantine Boussalis,et al.  Text-mining the signals of climate change doubt , 2016 .

[16]  Tomasa Rodrigo,et al.  How do the EM Central Bank talk? A Big Data approach to the Central Bank of Turkey , 2017 .

[17]  Loni Hagen,et al.  Understanding Citizens' Direct Policy Suggestions to the Federal Government: A Natural Language Processing and Topic Modeling Approach , 2015, 2015 48th Hawaii International Conference on System Sciences.

[18]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[19]  Michael Ehrmann,et al.  Starting from a Blank Page? Semantic Similarity in Central Bank Communication and Market Volatility , 2017, Journal of Monetary Economics.

[20]  Vasile Rus,et al.  Experiments with Semantic Similarity Measures Based on LDA and LSA , 2013, SLSP.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Dragomir R. Radev,et al.  How to Analyze Political Attention with Minimal Assumptions and Costs , 2010 .

[23]  Gary King,et al.  General purpose computer-assisted clustering and conceptualization , 2011, Proceedings of the National Academy of Sciences.

[24]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[25]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[26]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[27]  Miguel Acosta,et al.  Hanging on Every Word: Semantic Analysis of the FOMC's Postmeeting Statement , 2015 .