A robust experimental evaluation of automated multi-label classification methods