Are metrics measuring what they should? An evaluation of image captioning task metrics