Transformer-based quality assessment model for generalized user-generated multimedia audio content