Integrating multiple modalities into SLMs and parsing the output of SLMs