Preparing next-generation scientists for biomedical big data: artificial intelligence approaches.

Personalized medicine is being realized by our ability to measure biological and environmental information about patients. Much of these data are being stored in electronic health records yielding big data that presents challenges for its management and analysis. Here, we review several areas of knowledge that are necessary for next-generation scientists to fully realize the potential of biomedical big data. We begin with an overview of big data and its storage and management. We then review statistics and data science as foundational topics followed by a core curriculum of artificial intelligence, machine learning and natural language processing that are needed to develop predictive models for clinical decision making. We end with some specific training recommendations for preparing next-generation scientists for biomedical big data.

[1]  Vural Özdemir,et al.  Birth of Industry 5.0: Making Sense of Big Data with Artificial Intelligence, "The Internet of Things" and Next-Generation Technology Policy. , 2018, Omics : a journal of integrative biology.

[2]  Kyong-Jee Kim,et al.  Enhancement of student perceptions of learner-centeredness and community of inquiry in flipped classrooms , 2018, BMC medical education.

[3]  Pierre Baldi,et al.  Deep Learning in Biomedical Data Science , 2018, Annual Review of Biomedical Data Science.

[4]  Fateme Jafaraghaee,et al.  Comparing the effects of traditional lecture and flipped classroom on nursing students' critical thinking disposition: A quasi-experimental study. , 2018, Nurse education today.

[5]  Randal S. Olson,et al.  Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.

[6]  Harvey Goldstein,et al.  Challenges in administrative data linkage for research , 2017, Big Data Soc..

[7]  Kyung-Ah Sohn,et al.  Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction , 2014, J. Am. Medical Informatics Assoc..

[8]  R. Duda,et al.  Expert Systems Research. , 1983, Science.

[9]  Jason H. Moore,et al.  No-boundary thinking in bioinformatics research , 2013, BioData Mining.

[10]  Gil Alterovitz,et al.  Seeing the forest through the trees: uncovering phenomic complexity through interactive network visualization , 2015, J. Am. Medical Informatics Assoc..

[11]  Marylyn D. Ritchie,et al.  Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer , 2015, J. Biomed. Informatics.

[12]  I. Norstedt,et al.  Personalized Medicine in Europe , 2017, Clinical and translational science.

[13]  Jason H. Moore,et al.  Eleven quick tips for architecting biomedical informatics workflows with cloud computing , 2018, PLoS Comput. Biol..

[14]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[15]  Randal S. Olson,et al.  A System for Accessible Artificial Intelligence , 2017, GPTP.

[16]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[17]  Eric J Topol,et al.  High-performance medicine: the convergence of human and artificial intelligence , 2019, Nature Medicine.

[18]  Jung-Hsien Chiang,et al.  Literature-based discovery of new candidates for drug repurposing , 2016, Briefings Bioinform..

[19]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[20]  Thomas D. Wu,et al.  Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. , 2006, Cancer cell.

[21]  Isaac S Kohane,et al.  Artificial Intelligence in Healthcare , 2019, Artificial Intelligence and Machine Learning for Business for Non-Engineers.

[22]  Sophia Ananiadou,et al.  Extracting semantically enriched events from biomedical literature , 2012, BMC Bioinformatics.

[23]  Fernando Pereira,et al.  Automatically annotating documents with normalized gene lists , 2005, BMC Bioinformatics.

[24]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[25]  Flipped classroom narrows the performance gap between low- and high-performing dental students in physiology. , 2018, Advances in physiology education.

[26]  Apilak Worachartcheewan,et al.  AutoWeka: toward an automated data mining software for QSAR and QSPR studies. , 2015, Methods in molecular biology.

[27]  Yu Zhang,et al.  Big data - a 21st century science Maginot Line? No-boundary thinking: shifting from the big data paradigm , 2015, BioData Mining.

[28]  Wei Luo,et al.  A framework for feature extraction from hospital medical data with applications in risk prediction , 2014, BMC Bioinformatics.

[29]  Vural Özdemir,et al.  The Dark Side of the Moon: The Internet of Things, Industry 4.0, and The Quantified Planet. , 2018, Omics : a journal of integrative biology.

[30]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[31]  E. Tabak,et al.  Dynamical Phenotyping: Using Temporal Analysis of Clinically Collected Physiologic Data to Stratify Populations , 2014, PloS one.

[32]  Michael Wainberg,et al.  Deep learning in biomedicine , 2018, Nature Biotechnology.

[33]  J. Westphal Macrolide - induced clinically relevant drug interactions with cytochrome P-450A (CYP) 3A4: an update focused on clarithromycin, azithromycin and dirithromycin. , 2001, British journal of clinical pharmacology.

[34]  Carl Mitcham,et al.  Acknowledging AI's dark side. , 2015, Science.

[35]  Ulf Leser,et al.  What makes a gene name? Named entity recognition in the biomedical literature , 2005, Briefings Bioinform..

[36]  S Velupillai,et al.  Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis , 2015, Yearbook of Medical Informatics.

[37]  Hongfang Liu,et al.  Gene name ambiguity of eukaryotic nomenclatures , 2005, Bioinform..

[38]  Sanguthevar Rajasekaran,et al.  Efficient Record Linkage Algorithms Using Complete Linkage Clustering , 2016, PloS one.

[39]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[40]  M. Erlewyn-Lajeunesse,et al.  Recommendations for the administration of influenza vaccine in children allergic to egg , 2009, BMJ : British Medical Journal.

[41]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[42]  Russ B. Altman,et al.  Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text , 2009, BMC Bioinformatics.

[43]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[44]  Feng Liu,et al.  Deep Learning and Its Applications in Biomedicine , 2018, Genom. Proteom. Bioinform..

[45]  Brett K. Beaulieu-Jones,et al.  Reproducibility of computational workflows is automated using continuous analysis , 2017, Nature Biotechnology.

[46]  Jason H. Moore,et al.  Analysis validation has been neglected in the Age of Reproducibility , 2018, PLoS biology.

[47]  Jared A. Danielson,et al.  Flipped Classroom Use in Veterinary Education: A Multinational Survey of Faculty Experiences. , 2019, Journal of veterinary medical education.

[48]  Sabine Pfeiffer The Vision of “Industrie 4.0” in the Making—a Case of Future Told, Tamed, and Traded , 2017, Nanoethics.