Málrómur: A Manually Verified Corpus of Recorded Icelandic Speech
暂无分享,去创建一个
This paper describes the Málrómur corpus, an open, manually verified, Icelandic speech corpus. The recordings were collected in 2011–2012 by Reykjavik University and the Icelandic Center for Language Technology in cooperation with Google. 152 hours of speech were recorded from 563 participants. The recordings were subsequently manually inspected by evaluators listening to all the segments, determining whether any given segment contains the utterance the participant was supposed to read, and nothing else. Out of 127,286 recorded segments 108,568 were approved and 18,718 deemed unsatisfactory.
[1] Thad Hughes,et al. Building transcribed speech corpora quickly and cheaply for many languages , 2010, INTERSPEECH.
[2] Jón Guðnason,et al. Almannarómur: an open icelandic speech corpus , 2012, SLTU.
[3] Eiríkur Rögnvaldsson,et al. Language Resources for Icelandic , 2013 .