Machine learning on small size samples: A synthetic knowledge synthesis

One of the increasingly important technologies dealing with the growing complexity of the digitalization of almost all human activities is Artificial intelligence, more precisely machine learning Despite the fact, that we live in a Big data world where almost everything is digitally stored, there are many real-world situations, where researchers are faced with small data samples. The present study aim is to answer the following research question namely What is the small data problem in machine learning and how it is solved?. Our bibliometric study showed a positive trend in the number of research publications concerning the use of small datasets and substantial growth of the research community dealing with the small dataset problem, indicating that the research field is moving toward higher maturity levels. Despite notable international cooperation, the regional concentration of research literature production in economically more developed countries was observed.

[1]  Yu Yang,et al.  Support tensor machine with dynamic penalty factors and its application to the fault diagnosis of rotating machinery with unbalanced data , 2020 .

[2]  Louis Marceau,et al.  A comparison of Deep Learning performances with others machine learning algorithms on credit scoring unbalanced data , 2019, ArXiv.

[3]  Yoram Reich,et al.  Machine learning of material behaviour knowledge from empirical data , 1995 .

[4]  Ellen Poliakoff,et al.  Machine learning algorithm validation with a limited sample size , 2019, PloS one.

[5]  Alexios Papacharalampopoulos,et al.  Context awareness system in the use phase of a smart mobility platform: A vision system for a light-weight approach , 2020 .

[6]  Minho Lee,et al.  Sensitive deep convolutional neural network for face recognition at large standoffs with small dataset , 2017, Expert Syst. Appl..

[7]  Lubna Mahmoud Abu Zohair Prediction of Student’s performance by modelling small dataset size , 2019, International Journal of Educational Technology in Higher Education.

[8]  Thierry Dutoit,et al.  Exploring Transfer Learning for Low Resource Emotional TTS , 2019, IntelliSys.

[9]  D. Železnik,et al.  A bibliometric analysis of the Journal of Advanced Nursing, 1976–2015 , 2017, Journal of advanced nursing.

[10]  Mohd Salman Leong,et al.  Gearbox Fault Diagnosis Using a Deep Learning Model With Limited Data Sample , 2020, IEEE Transactions on Industrial Informatics.

[11]  Ying Zhang,et al.  A strategy to apply machine learning to small datasets in materials science , 2018, npj Computational Materials.

[12]  P Eklund,et al.  Using data preprocessing and single layer perceptron to analyze laboratory data. , 1995, Scandinavian journal of clinical and laboratory investigation. Supplementum.

[13]  Philipp Mayr,et al.  Editorial: Mining Scientific Papers: NLP-enhanced Bibliometrics , 2019, Front. Res. Metr. Anal..

[14]  Curtis P. Langlotz,et al.  AppendiXNet: Deep Learning for Diagnosis of Appendicitis from A Small Dataset of CT Exams Using Video Pretraining , 2020, Scientific Reports.

[15]  Anindya Iqbal,et al.  A Benchmark Study on Machine Learning Methods for Fake News Detection , 2019, ArXiv.

[16]  Britt-Marie Lindgren,et al.  Abstraction and interpretation during the qualitative content analysis process. , 2020, International journal of nursing studies.

[17]  Lawrence O. Hall,et al.  Finding Covid-19 from Chest X-rays using Deep Learning on a Small Dataset , 2020, ArXiv.

[18]  Rosa María Cantón Croda,et al.  Sales Prediction through Neural Networks for a Small Dataset , 2019, Int. J. Interact. Multim. Artif. Intell..

[19]  Indu Kumar,et al.  A Comparative Study of Supervised Machine Learning Algorithms for Stock Market Trend Prediction , 2018, 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT).

[20]  Nilanjan Dey,et al.  Finding an Accurate Early Forecasting Model from Small Dataset: A Case of 2019-nCoV Novel Coronavirus Outbreak , 2020, Int. J. Interact. Multim. Artif. Intell..

[21]  Weihong Deng,et al.  Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[22]  Sanjay Sarma,et al.  Hi Sigma, do I have the Coronavirus?: Call for a New Artificial Intelligence Approach to Support Health Care Professionals Dealing With The COVID-19 Pandemic , 2020, ArXiv.

[23]  Peter Kokol,et al.  Software Development with Scrum: A Bibliometric Analysis and Profile , 2021, ArXiv.

[24]  Yang Song,et al.  Topological Analysis and Gaussian Decision Tree: Effective Representation and Classification of Biosignals of Small Sample Size. , 2016, IEEE transactions on bio-medical engineering.

[25]  Patrick T. Komiske,et al.  Learning to classify from impure samples with high-dimensional data , 2018, Physical Review D.

[26]  Ludo Waltman,et al.  Software survey: VOSviewer, a computer program for bibliometric mapping , 2009, Scientometrics.

[27]  Liang Gao,et al.  A transfer convolutional neural network for fault diagnosis based on ResNet-50 , 2019, Neural Computing and Applications.

[28]  N M Davis,et al.  The critical path. , 1982, Hospital pharmacy.

[29]  Ludo Waltman,et al.  Visualizing Bibliometric Networks , 2014 .

[30]  N. A. Khovanova,et al.  Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation , 2017, Biomed. Signal Process. Control..

[31]  Weiguo Fan,et al.  A new image classification method using CNN transfer learning and web data augmentation , 2018, Expert Syst. Appl..

[32]  Jie Cao,et al.  An accurate traffic classification model based on support vector machines , 2017, Int. J. Netw. Manag..

[33]  Jaegyoon Ahn,et al.  GVES: machine learning model for identification of prognostic genes with a small dataset , 2021, Scientific Reports.

[34]  Shivajirao M. Jadhav,et al.  Deep convolutional neural network based medical image classification for disease diagnosis , 2019, Journal of Big Data.

[35]  Yan-Lin He,et al.  A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: An empirical study of petrochemical industries , 2017 .

[36]  X. W. Liang,et al.  LR-SMOTE - An improved unbalanced data set oversampling based on K-means and SVM , 2020, Knowl. Based Syst..

[37]  Imran Razzak,et al.  Randomized nonlinear one-class support vector machines with bounded loss function to detect of outliers for large scale IoT data , 2020, Future Gener. Comput. Syst..

[38]  Hongkai Jiang,et al.  Data augmentation for rolling bearing fault diagnosis using an enhanced few-shot Wasserstein auto-encoder with meta-learning , 2021, Measurement Science and Technology.

[39]  Alexander M Shneider Four stages of a scientific discipline; four types of scientist. , 2009, Trends in biochemical sciences.

[40]  Peter Kokol,et al.  Enhancing the role of academic librarians in conducting scoping reviews , 2021, ArXiv.

[41]  B. Marzocchi,et al.  Machine learning application for development of a data-driven predictive model able to investigate quality of life scores in a rare disease , 2020, Orphanet Journal of Rare Diseases.

[42]  Ivan Merelli,et al.  Comparing Deep and Machine Learning Approaches in Bioinformatics: A miRNA-Target Prediction Case Study , 2019, ICCS.

[43]  Guido van Wingen,et al.  Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders , 2020 .

[44]  Yan Xu,et al.  Using machine learning and feature engineering to characterize limited material datasets of high-entropy alloys , 2020 .

[45]  S. Kaparthi,et al.  A Bibliometric Analysis , 2005, J. Decis. Syst..

[46]  Alhadi Bustamam,et al.  Comparison of Dengue Predictive Models Developed Using Artificial Neural Network and Discriminant Analysis with Small Dataset , 2021, Applied Sciences.

[47]  Huiyu Zhou,et al.  Using deep neural network with small dataset to predict material defects , 2019, Materials & Design.

[48]  Ola Spjuth,et al.  Large-scale ligand-based predictive modelling using support vector machines , 2016, Journal of Cheminformatics.

[49]  Thomas Renault,et al.  Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages , 2019, Digital Finance.

[50]  M. Kääriäinen,et al.  The Application of Content Analysis in Nursing Science Research , 2020 .

[51]  Yuanyuan Qin,et al.  Detecting Alzheimer's Disease on Small Dataset: A Knowledge Transfer Perspective , 2019, IEEE Journal of Biomedical and Health Informatics.

[52]  L. Moutinho,et al.  The network science approach in determining the intellectual structure, emerging trends and future research opportunities – An application to senior tourism research , 2019, Tourism Management Perspectives.

[53]  Richard K. G. Do,et al.  Convolutional neural networks: an overview and application in radiology , 2018, Insights into Imaging.

[54]  Loet Leydesdorff,et al.  Do Scientific Advancements Lean on the Shoulders of Giants? A Bibliometric Investigation of the Ortega Hypothesis , 2010, PloS one.

[55]  A. Ackery,et al.  Artificial Intelligence in Emergency Medicine: Surmountable Barriers With Revolutionary Potential. , 2020, Annals of emergency medicine.

[56]  Stefan Neubauer,et al.  Improving cardiac MRI convolutional neural network segmentation on small training datasets and dataset shift: A continuous kernel cut approach , 2020, Medical Image Anal..

[57]  Hiroyuki Kurata,et al.  Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction , 2020, Current genomics.

[58]  Eliot Winer,et al.  Using high-fidelity meta-models to improve performance of small dataset trained Bayesian Networks , 2020, Expert Syst. Appl..

[59]  D. Železnik,et al.  Trends in nursing ethics research: Mapping the literature production , 2017, Nursing ethics.

[60]  Qunxiong Zhu,et al.  Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach , 2020, Soft Comput..

[61]  C. Krittanawong,et al.  The rise of artificial intelligence and the uncertain future for physicians. , 2017, European journal of internal medicine.

[62]  Misgina Tsighe Hagos,et al.  Transfer Learning based Detection of Diabetic Retinopathy from Small Dataset , 2019, ArXiv.

[63]  Peter Kokol,et al.  Using of genetic programming in engineering , 2014 .

[64]  Yishi Zhang,et al.  Evaluating and selecting features via information theoretic lower bounds of feature inner correlations for high-dimensional data , 2021, Eur. J. Oper. Res..