Towards Combining Object Detection and Text Classification Models for Form Entity Recognition