Zero-shot Object Detection Through Vision-Language Embedding Alignment