Information extraction from HTML product catalogues : coupling quantitative and knowledge-based approaches