Fusion-driven deep feature network for enhanced object detection and tracking in video surveillance systems