论文信息 - LSG Attention: Extrapolation of pretrained Transformers to long sequences - 字舞流文

LSG Attention: Extrapolation of pretrained Transformers to long sequences

Charles Condevaux | S. Harispe