Multi-modal recommendation algorithm fusing visual and textual features