KIDS: A database of children’s speech

A database of children reading age‐ and reading‐level‐appropriate text aloud has been collected. This labeled data, to be distributed in the near future, was primarily intended to be used in CMU’s LISTEN tutor, which employs speech recognition to monitor children’s reading and then helps correct errors. The speaker population was therefore chosen to represent good and poor readers and to incorporate dialects of the speakers for whom the reading coach is intended. Phonemic balance could not be achieved (although it has been calculated) since the primary concern in recording children reading is to present sentences that can effectively be read by first through third graders. The text is a series of sentences that was adapted from text in the Weekly Reader series—most of the adaptation concerned the lack of the accompanying images. The text was chosen for its intrinsic interest and widespread use. Several trial recording sessions were used to develop a protocol that kept extraneous noises produced by the chi...