Describes the collection of a phonetically-rich isolated-word telephone-speech database, "PhoneBook", which was undertaken because of (1) the lack of available large-vocabulary isolated-word data, (2) anticipated continued importance of isolated-word and keyword-spotting technology to speech-recognition-based applications over the telephone, and (3) findings that continuous-speech training data is inferior to isolated-word training for isolated-word recognition. PhoneBook has nearly 8000 distinct words, selected for complete coverage of phoneme contexts enumerated using both triphones and a novel method which takes into account syllable position, lexical stress, and non-adjacent-phoneme coarticulatory effects. PhoneBook consists of more than 92000 utterances, averaging over 11 talkers for each word. A demographically-representative set of over 1300 native speakers of American English each made a single telephone call and read 75 words. The paper describes the word list design, talker enrolment procedure, recording procedure and equipment, utterance verification method, and summary statistics for PhoneBook, which will be made available through the Linguistic Data Consortium.
[1]
K. Shikano,et al.
Robust HMM phoneme modeling for different speaking styles
,
1991,
[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[2]
Judith Spitz.
Collection and Analysis of Data from Real Users: Implications for Speech Recognition/Understanding Systems
,
1991,
HLT.
[3]
Barbara Eisen,et al.
Reliability of speech segmentation and labelling at different levels of transcription
,
1993,
EUROSPEECH.
[4]
Rob Kassel.
Automating the design of compact linguistic corpora
,
1994,
ICSLP.
[5]
Sara H. Basson,et al.
NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database
,
1990,
International Conference on Acoustics, Speech, and Signal Processing.
[6]
Victor Zue,et al.
Toward vocabulary-independent recognition of telephone speech
,
1991,
EUROSPEECH.