Artificial General Intelligence for Medical Imaging

In this review, we explore the potential applications of Artificial General Intelligence (AGI) models in healthcare, focusing on foundational Large Language Models (LLMs), Large Vision Models, and Large Multimodal Models. We emphasize the importance of integrating clinical expertise, domain knowledge, and multimodal capabilities into AGI models. In addition, we lay out key roadmaps that guide the development and deployment of healthcare AGI models. Throughout the review, we provide critical perspectives on the potential challenges and pitfalls associated with deploying large-scale AGI models in the medical field. This comprehensive review aims to offer insights into the future implications of AGI in medical imaging, healthcare and beyond.

[1]  Abdelrahman M. Shaker,et al.  XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models , 2023, ArXiv.

[2]  Kai Zhang,et al.  BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks , 2023, ArXiv.

[3]  Hui Liu,et al.  DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models , 2023, ArXiv.

[4]  Luke Zettlemoyer,et al.  QLoRA: Efficient Finetuning of Quantized LLMs , 2023, NeurIPS.

[5]  Hongsheng Hu,et al.  Watermarking Text Data on Large Language Models for Dataset Copyright Protection , 2023, ArXiv.

[6]  Weidi Xie,et al.  PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering , 2023, ArXiv.

[7]  Timothy M. Hospedales,et al.  Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity , 2023, ArXiv.

[8]  Sheng Li,et al.  BadSAM: Exploring Security Vulnerabilities of SAM via Backdoor Attacks , 2023, ArXiv.

[9]  W. Liu,et al.  Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT , 2023, ArXiv.

[10]  Xiang Li,et al.  Prompt Engineering for Healthcare: Methodologies and Applications , 2023, ArXiv.

[11]  Shaoting Zhang,et al.  SAM on Medical Images: A Comprehensive Study on Three Prompt Modes , 2023, ArXiv.

[12]  Xin Yang,et al.  Segment Anything Model for Medical Images? , 2023, ArXiv.

[13]  Kaiwen Zhang,et al.  Customized Segment Anything Model for Medical Image Segmentation , 2023, ArXiv.

[14]  Yan Hu,et al.  Learnable Ophthalmology SAM , 2023, ArXiv.

[15]  Haoming Jiang,et al.  Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond , 2023, ACM Trans. Knowl. Discov. Data.

[16]  Jianing Qiu,et al.  Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation , 2023, Diagnostics.

[17]  T. Arbel,et al.  Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation , 2023, ArXiv.

[18]  Bo Wang,et al.  Segment Anything in Medical Images , 2023, ArXiv.

[19]  Tianming Liu,et al.  Differentiate ChatGPT-generated and Human-written Medical Texts , 2023, ArXiv.

[20]  Mohamed Elhoseiny,et al.  MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models , 2023, ArXiv.

[21]  Y. Zhang,et al.  Segment Anything Model for Medical Image Analysis: an Experimental Study , 2023, Medical Image Anal..

[22]  Quanzheng Li,et al.  Exploring the Trade-Offs: Unified Large Language Models vs Local Fine-Tuned Models for Highly-Specific Radiology NLI Task , 2023, ArXiv.

[23]  Lei Guo,et al.  ImpressionGPT: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT , 2023, IEEE Transactions on Artificial Intelligence.

[24]  Yong Jae Lee,et al.  Visual Instruction Tuning , 2023, ArXiv.

[25]  M. Armand,et al.  SAMM (Segment Any Medical Model): A 3D Slicer Integration to SAM , 2023, ArXiv.

[26]  Ross B. Girshick,et al.  Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Siyuan Ma,et al.  Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models , 2023, Meta-Radiology.

[28]  J. Leskovec,et al.  Foundation models for generalist medical artificial intelligence , 2023, Nature.

[29]  W. Liu,et al.  Evaluating large language models on a highly-specialized topic, radiation oncology physics , 2023, Frontiers in oncology.

[30]  Xi Jiang,et al.  When Brain-inspired AI Meets AGI , 2023, Meta-Radiology.

[31]  W. Liu,et al.  DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 , 2023, ArXiv.

[32]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[33]  Hyunsu Lee The rise of ChatGPT: Exploring its potential in medical education , 2023, Anatomical sciences education.

[34]  Philip S. Yu,et al.  A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT , 2023, ArXiv.

[35]  Mehdi S. M. Sajjadi,et al.  PaLM-E: An Embodied Multimodal Language Model , 2023, ICML.

[36]  M. Cascella,et al.  Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios , 2023, Journal of Medical Systems.

[37]  A. Katz,et al.  The Exciting Potential for ChatGPT in Obstetrics and Gynecology. , 2023, American journal of obstetrics and gynecology.

[38]  Malik Sallam,et al.  ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns , 2023, Healthcare.

[39]  Li Dong,et al.  Language Is Not All You Need: Aligning Perception with Language Models , 2023, NeurIPS.

[40]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[41]  Michel Galley,et al.  Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback , 2023, ArXiv.

[42]  Douglas C. Schmidt,et al.  A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT , 2023, ArXiv.

[43]  Lichao Sun,et al.  BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT , 2023, ArXiv.

[44]  Nanyang Technological University,et al.  A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT , 2023, ArXiv.

[45]  Lichao Sun,et al.  Backdoor Attacks to Pre-trained Unified Foundation Models , 2023, ArXiv.

[46]  Xi Ouyang,et al.  ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models , 2023, ArXiv.

[47]  Maad M. Mijwil,et al.  ChatGPT: Exploring the Role of Cybersecurity in the Protection of Medical Information , 2023, Mesopotamian Journal of Cyber Security.

[48]  S. Savarese,et al.  BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models , 2023, ICML.

[49]  Lei Liu,et al.  Context Matters: A Strategy to Pre-train Language Model for Science Education , 2023, AIED.

[50]  Fei Huang,et al.  One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER , 2023, ArXiv.

[51]  J. El-Khoury,et al.  Evaluating the Performance of ChatGPT in Ophthalmology , 2023, medRxiv.

[52]  Mario Fritz,et al.  Holistically Explainable Vision Transformers , 2023, ArXiv.

[53]  Paolo S. Silva,et al.  Bias and Non-Diversity of Big Data in Artificial Intelligence: Focus on Retinal Diseases , 2023, Seminars in ophthalmology.

[54]  Mohamed Akrout,et al.  Diffusion-based Data Augmentation for Skin Disease Classification: Impact Across Original Medical Datasets to Fully Synthetic Images , 2023, DGM4MICCAI.

[55]  Alexander J. Smola,et al.  Parameter-Efficient Fine-Tuning Design Spaces , 2023, ICLR.

[56]  Damai Dai,et al.  A Survey on In-context Learning , 2022, 2301.00234.

[57]  Hyung Won Chung,et al.  Large language models encode clinical knowledge , 2022, Nature.

[58]  R. Dobson,et al.  A survey on clinical natural language processing in the United Kingdom from 2007 to 2022 , 2022, npj Digit. Medicine.

[59]  Tiffany H. Kung,et al.  Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models , 2022, medRxiv.

[60]  Ehsan Khodapanah Aghdam,et al.  Diffusion models in medical imaging: A comprehensive survey , 2022, Medical Image Anal..

[61]  Quanzheng Li,et al.  Coarse-to-fine Knowledge Graph Domain Adaptation based on Distantly-supervised Iterative Training , 2022, ArXiv.

[62]  Jimeng Sun,et al.  MedCLIP: Contrastive Learning from Unpaired Medical Images and Text , 2022, EMNLP.

[63]  Ludwig Schmidt,et al.  LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.

[64]  W. Liu,et al.  Survey on natural language processing in medical image analysis. , 2022, Zhong nan da xue xue bao. Yi xue ban = Journal of Central South University. Medical sciences.

[65]  Tianming Liu,et al.  AgriBERT: Knowledge-Infused Agricultural Language Models for Matching Food and Nutrition , 2022, IJCAI.

[66]  Yang Wang,et al.  Cross-modal fusion for multi-label image classification with attention mechanism , 2022, Comput. Electr. Eng..

[67]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[68]  Gerard de Melo,et al.  Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[69]  Michael Biehl,et al.  Interpretable Models Capable of Handling Systematic Missingness in Imbalanced Classes and Heterogeneous Datasets , 2022, ArXiv.

[70]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[71]  Oriol Vinyals,et al.  Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.

[72]  F. Khan,et al.  Visual Attention Methods in Deep Learning: An In-Depth Survey , 2022, Inf. Fusion.

[73]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[74]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[75]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Prafulla Dhariwal,et al.  GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.

[77]  Zi-Yi Dou,et al.  An Empirical Study of Training End-to-End Vision-and-Language Transformers , 2021, Computer Vision and Pattern Recognition.

[78]  Nima Tajbakhsh,et al.  Guest Editorial Annotation-Efficient Deep Learning: The Holy Grail of Medical Imaging , 2021, IEEE Trans. Medical Imaging.

[79]  Hiroaki Hayashi,et al.  Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..

[80]  Yelong Shen,et al.  LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.

[81]  Shannon L. Risacher,et al.  Deep Fusion of Brain Structure-Function in Mild Cognitive Impairment , 2021, Medical Image Anal..

[82]  Brian Lester,et al.  The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[83]  Zhengxiao Du,et al.  GPT Understands, Too , 2021, AI Open.

[84]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[85]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[86]  Mohamed Elhoseiny,et al.  VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  P. Noseworthy,et al.  Artificial intelligence-enhanced electrocardiography in cardiovascular disease management , 2021, Nature Reviews Cardiology.

[88]  Yu Su,et al.  An Investigation of Language Model Interpretability via Sentence Editing , 2020, BLACKBOXNLP.

[89]  Pengtao Xie,et al.  MedDialog: Large-scale Medical Dialogue Datasets , 2020, EMNLP.

[90]  Michael D Abràmoff,et al.  Identifying Ethical Considerations for Machine Learning Healthcare Applications , 2020, The American journal of bioethics : AJOB.

[91]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[92]  Marzyeh Ghassemi,et al.  Ethical Machine Learning in Health Care , 2020, Annual review of biomedical data science.

[93]  Hinrich Schütze,et al.  It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.

[94]  Jianfeng Gao,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[95]  Spyridon Bakas,et al.  Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data , 2020, Scientific Reports.

[96]  Christoph Zetzsche,et al.  Early vs Late Fusion in Multimodal Convolutional Neural Networks , 2020, 2020 IEEE 23rd International Conference on Information Fusion (FUSION).

[97]  Rickmer Braren,et al.  Secure, privacy-preserving and federated machine learning in medical imaging , 2020, Nature Machine Intelligence.

[98]  E. Guney,et al.  Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare , 2020, npj Digital Medicine.

[99]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[100]  Graeme Hirst,et al.  Using word embeddings to improve the privacy of clinical notes , 2020, J. Am. Medical Informatics Assoc..

[101]  Dajiang Zhu,et al.  Jointly Analyzing Alzheimer's Disease Related Structure-Function Using Deep Cross-Model Attention Network , 2020, 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI).

[102]  Micah J. Sheller,et al.  The future of digital health with federated learning , 2020, npj Digital Medicine.

[103]  Sarangapani Jagannathan,et al.  A comprehensive survey on model compression and acceleration , 2020, Artificial Intelligence Review.

[104]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[105]  Dajiang Zhu,et al.  Multi-modal Image Prediction via Spatial Hybrid U-Net , 2019, MMMI@MICCAI.

[106]  Bernhard Kainz,et al.  A Survey on Active Learning and Human-in-the-Loop Deep Learning for Medical Image Analysis , 2019, Medical Image Anal..

[107]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[108]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[109]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[110]  Asma Ben Abacha,et al.  A question-entailment approach to question answering , 2019, BMC Bioinformatics.

[111]  J. Dean,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[112]  Wei Liu,et al.  Robust radiotherapy planning , 2018, Physics in medicine and biology.

[113]  Asma Chebli,et al.  Semi-Supervised Learning for Medical Application: A Survey , 2018, 2018 International Conference on Applied Smart Systems (ICASS).

[114]  David A. Patterson,et al.  Motivation for and Evaluation of the First Tensor Processing Unit , 2018, IEEE Micro.

[115]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[116]  A. Meyer-Lindenberg,et al.  Machine Learning for Precision Psychiatry: Opportunities and Challenges. , 2017, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[117]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[118]  Syed Muhammad Anwar,et al.  Medical Image Analysis using Convolutional Neural Networks: A Review , 2017, Journal of Medical Systems.

[119]  Wei Liu,et al.  Consensus Guidelines for Implementing Pencil-Beam Scanning Proton Therapy for Thoracic Malignancies on Behalf of the PTCOG Thoracic and Lymphoma Subcommittee. , 2017, International journal of radiation oncology, biology, physics.

[120]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[121]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[122]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[123]  Lu Zhang,et al.  A developmental actor-critic reinforcement learning approach for task-nonspecific robot , 2016, 2016 IEEE Chinese Guidance, Navigation and Control Conference (CGNCC).

[124]  Yuan Yang,et al.  Task-specific pre-learning to improve the convergence of reinforcement learning based on a deep neural network , 2016, 2016 12th World Congress on Intelligent Control and Automation (WCICA).

[125]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[126]  Clement J. McDonald,et al.  Preparing a collection of radiology examinations for distribution and retrieval , 2015, J. Am. Medical Informatics Assoc..

[127]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[128]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[129]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[130]  Khaled El Emam,et al.  Practicing Differential Privacy in Health Care: A Review , 2013, Trans. Data Priv..

[131]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[132]  B. van Ginneken,et al.  Computer-aided diagnosis: how to move from the laboratory to the clinic. , 2011, Radiology.

[133]  Mona Calhoun,et al.  Privacy, Confidentiality, HIPAA, and HITECH: Implications for the Health Care Practitioner , 2011 .

[134]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[135]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[136]  Prodromos D. Chatzoglou,et al.  Methods for evaluating hospital information systems: a literature review , 2008 .

[137]  Philip W. Anderson,et al.  More Is Different Broken symmetry and the nature of the hierarchical structure of science , 1972 .

[138]  Cees G. M. Snoek,et al.  Early versus late fusion in semantic video analysis , 2005, MULTIMEDIA '05.

[139]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[140]  Ying-Dong Zang,et al.  SAM Fails to Segment Anything? - SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, and More , 2023, ArXiv.

[141]  Yangming Ou,et al.  Accuracy of Segment-Anything Model (SAM) in medical image segmentation tasks , 2023, ArXiv.

[142]  Tianming Liu,et al.  ChatAug: Leveraging ChatGPT for Text Data Augmentation , 2023, ArXiv.

[143]  W. Liu,et al.  ClinicalRadioBERT: Knowledge-Infused Few Shot Learning for Clinical Notes Named Entity Recognition , 2022, MLMI@MICCAI.

[144]  Vijay Sadashivaiah,et al.  Improving Language Model Predictions via Prompts Enriched with Knowledge Graphs , 2022, DL4KG@ISWC.

[145]  Zhilin Yang,et al.  P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks , 2022, ACL.

[146]  A. Linear-probe,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021 .

[147]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[148]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[149]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[150]  Aidong Zhang,et al.  A Survey on Context Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[151]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.