Building an annotated corpus for the Albanian language using bilingual projections and regular expressions