Forecasting emerging technologies using data augmentation and deep learning

Deep learning can be used to forecast emerging technologies based on patent data. However, it requires a large amount of labeled patent data as a training set, which is difficult to obtain due to various constraints. This study proposes a novel approach that integrates data augmentation and deep learning methods, which overcome the problem of lacking training samples when applying deep learning to forecast emerging technologies. First, a sample data set was constructed using Gartner’s hype cycle and multiple patent features. Second, a generative adversarial network was used to generate many synthetic samples (data augmentation) to expand the scale of the sample data set. Finally, a deep neural network classifier was trained with the augmented data set to forecast emerging technologies, and it could predict up to 77% of the emerging technologies in a given year with high precision. This approach was used to forecast emerging technologies in Gartner’s hype cycles for 2017 based on patent data from 2000 to 2016. Four out of six of the emerging technologies were forecasted correctly, showing the accuracy and precision of the proposed approach. This approach enables deep learning to forecast emerging technologies with limited training samples.

[1]  Anthony Breitzman,et al.  Using patents prospectively to identify emerging, high-impact technological clusters , 2009 .

[2]  Patrick Thomas,et al.  Inventor team size as a predictor of the future citation impact of patents , 2015, Scientometrics.

[3]  Heng Lin,et al.  A novel method to identify emerging technologies using a semi-supervised topic clustering model: a case of 3D printing industry , 2019, Scientometrics.

[4]  Xin Liu,et al.  Wasserstein GAN-Based Small-Sample Augmentation for New-Generation Artificial Intelligence: A Case Study of Cancer-Staging Data in Biology , 2019, Engineering.

[5]  Dejing Kong,et al.  Using the data mining method to assess the innovation gap: A case of industrial robotics in a catching-up country , 2017 .

[6]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[7]  P. Bierly,et al.  Determinants of technology cycle time in the U.S. pharmaceutical industry , 1996 .

[8]  Fang Dong,et al.  Unfolding the convergence process of scientific knowledge for the early identification of emerging technologies , 2019, Technological Forecasting and Social Change.

[9]  Z. Griliches,et al.  Citations, Family Size, Opposition and the Value of Patent Rights Have Profited from Comments and Suggestions , 2002 .

[10]  Antonio Bonafonte,et al.  SEGAN: Speech Enhancement Generative Adversarial Network , 2017, INTERSPEECH.

[11]  Fei-Yue Wang,et al.  Generative adversarial networks: introduction and outlook , 2017, IEEE/CAA Journal of Automatica Sinica.

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[14]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[15]  Morten Goodwin Generative Adversarial Networks for Improving Face Classification , 2017 .

[16]  Yuxin Cui,et al.  DeepPatent: patent classification with convolutional neural networks and word embedding , 2018, Scientometrics.

[17]  Alan L. Porter,et al.  Tech mining for innovation management , 2013, Technol. Anal. Strateg. Manag..

[18]  B. Martin Foresight in science and technology , 1995 .

[19]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Lawrence D. Fu,et al.  Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature , 2010, Scientometrics.

[21]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[22]  Oh-Jin Kwon,et al.  Early identification of emerging technologies: A machine learning approach using multiple patent indicators , 2018 .

[23]  Massoud Pedram,et al.  Supervised Learning Based Power Management for Multicore Processors , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[24]  Sungjoo Lee,et al.  Technological Forecasting & Social Change Business planning based on technological capabilities : Patent analysis for technology-driven roadmapping ☆ , 2009 .

[25]  A. A. Kayal,et al.  An empirical evaluation of the technology cycle time indicator as a measure of the pace of technological progress in superconductor technology , 1999 .

[26]  Bronwyn H Hall,et al.  Innovation and Diffusion of Clean/Green Technology: Can Patent Commons Help? , 2011 .

[27]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[28]  Alfredo De Santis,et al.  Using generative adversarial networks for improving classification effectiveness in credit card fraud detection , 2017, Inf. Sci..

[29]  Eder Santana,et al.  Learning a Driving Simulator , 2016, ArXiv.

[30]  B. Love Comparing supervised and unsupervised category learning , 2002, Psychonomic bulletin & review.

[31]  Alan L. Porter,et al.  Does deep learning help topic extraction? A kernel k-means clustering method with word embedding , 2018, J. Informetrics.

[32]  J. Lerner The Importance of Patent Scope: An Empirical Analysis , 1994 .

[33]  浙江大学,et al.  Frontiers of information technology & electronic engineering , 2015 .

[34]  Sungroh Yoon,et al.  Disease Prediction from Electronic Health Records Using Generative Adversarial Networks , 2017, ArXiv.

[35]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[36]  Chun Chen,et al.  Challenges and opportunities: from big data to knowledge in AI 2.0 , 2017, Frontiers of Information Technology & Electronic Engineering.

[37]  Chao-Chan Wu,et al.  Using patent analyses to monitor the technological trends in an emerging field of technology: a case of carbon nanotube field emission display , 2009, Scientometrics.

[38]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[39]  Mark A. Schankerman,et al.  Patent Quality and Research Productivity: Measuring Innovation with Multiple Indicators , 2004 .

[40]  Vladimir Korzinov,et al.  A patent search strategy based on machine learning for the emerging field of service robotics , 2017, Scientometrics.

[41]  Bronwyn H Hall,et al.  The Importance (or Not) of Patents to UK Firms , 2013 .

[42]  Anthony Breitzman,et al.  The Emerging Clusters Model: A tool for identifying emerging technologies across multiple patent systems , 2015 .

[43]  Manuel Trajtenberg,et al.  Economic Analysis of Product Innovation: The Case of CT Scanners , 1990 .

[44]  Tugrul U. Daim,et al.  Forecasting emerging technologies: Use of bibliometrics and patent analysis , 2006 .

[45]  Patrick D. McDaniel,et al.  Machine Learning in Adversarial Settings , 2016, IEEE Security & Privacy.

[46]  George S. Day,et al.  Avoiding the Pitfalls of Emerging Technologies , 2000 .

[47]  Calvin S. Weng,et al.  A New Comprehensive Patent Analysis Approach for New Product Design in Mechanical Engineering , 2011 .

[48]  Sunghae Jun,et al.  Vacant technology forecasting using new Bayesian patent clustering , 2014, Technol. Anal. Strateg. Manag..

[49]  Ge Cheng,et al.  Forecasting emerging technologies: A supervised learning approach through patent analysis , 2017 .

[50]  Saeed-Ul Hassan,et al.  Deep context of citations using machine-learning models in scholarly full-text articles , 2018, Scientometrics.

[51]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[52]  Alan L. Porter,et al.  Emerging technologies: quantitative identification and measurement , 2010, Technol. Anal. Strateg. Manag..

[53]  Seung-Pyo Jun An empirical study of users’ hype cycle based on search traffic: the case study on hybrid cars , 2011, Scientometrics.

[54]  Zengchang Qin,et al.  Emotion Classification with Data Augmentation Using Generative Adversarial Networks , 2018, PAKDD.