In this paper we address the basic limitation of SiamMask - the state of the art single object tracking and segmentation algorithm. SiamMask requires semi-supervision in that it needs a bounding box to be drawn manually around the object that has to be tracked. This is however not always possible or feasible, and slows down the pipeline even in the best case. We overcome this limitation by using state-of-the-art object detection algorithms: Detectron2 and YOLO to automatically detect the object and then track using SiamMask. The main purpose of this study is to devise an efficient technique for an end-to-end object detection and tracking, which can then be used in other applications like self-driving cars, etc. We compared different approaches using current state-of-the-art tools for time and detection efficiency. One of the secondary aim was to test how the two approaches perform on different types of datasets. We note that YOLO gives better and more meaningful detection of objects in the scene. However, Detectron2 gives a higher detection speed than YOLO, making the overall detection and tracking process faster.