Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services

Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation models (PFMs), e.g., generative pretrained transformers (GPTs), can effectively provide various AI services, such as autonomous driving, digital twins, and AI-generated content (AIGC) for extended reality. With the advantages of low latency and privacy-preserving, serving PFMs of mobile AI services in edge intelligence is a viable solution for caching and executing PFMs on edge servers with limited computing resources and GPU memory. However, PFMs typically consist of billions of parameters that are computation and memory-intensive for edge servers during loading and execution. In this article, we investigate edge PFM serving problems for mobile AIGC services of Metaverse. First, we introduce the fundamentals of PFMs and discuss their characteristic fine-tuning and inference methods in edge intelligence. Then, we propose a novel framework of joint model caching and inference for managing models and allocating resources to satisfy users' requests efficiently. Furthermore, considering the in-context learning ability of PFMs, we propose a new metric to evaluate the freshness and relevance between examples in demonstrations and executing tasks, namely the Age of Context (AoC). Finally, we propose a least context algorithm for managing cached models at edge servers by balancing the tradeoff among latency, energy consumption, and accuracy.

[1]  Victor C. M. Leung,et al.  Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services , 2023, ArXiv.

[2]  Marco Tulio Ribeiro,et al.  Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[3]  Haitao Zheng,et al.  Parameter-efficient fine-tuning of large-scale pre-trained language models , 2023, Nature Machine Intelligence.

[4]  Nanyang Technological University,et al.  A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT , 2023, ArXiv.

[5]  Jiawen Kang,et al.  Generative AI-Empowered Simulation for Autonomous Driving in Vehicular Mixed Reality Metaverses , 2023, IEEE Journal of Selected Topics in Signal Processing.

[6]  Gabriel Ilharco,et al.  Reproducible Scaling Laws for Contrastive Language-Image Learning , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yali Wang,et al.  UniFormer: Unifying Convolution and Self-attention for Visual Recognition , 2022, ArXiv.

[8]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yusheng Ji,et al.  Information Freshness-Aware Task Offloading in Air-Ground Integrated Edge Computing Systems , 2020, IEEE Journal on Selected Areas in Communications.

[10]  Pengyuan Zhou,et al.  Vetaverse: Technologies, Applications, and Visions toward the Intersection of Metaverse, Vehicles, and Transportation Systems , 2022, ArXiv.

[11]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[12]  Robert J. Walls,et al.  Challenges and Opportunities of DNN Model Execution Caching , 2019, DIDL@Middleware.

[13]  Xu Chen,et al.  Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[14]  Jie Xu,et al.  Joint Service Caching and Task Offloading for Mobile Edge Computing in Dense Networks , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[15]  Aidong Zhang,et al.  A Survey on Context Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.