M$^2$IST: Multi-Modal Interactive Side-Tuning for Memory-efficient Referring Expression Comprehension