A Hindi speech recognizer for an agricultural video search application

Voice user interfaces for ICTD applications have immense potential in their ability to reach to a large illiterate or semi-literate population in these regions where text-based interfaces are of little use. However, building speech systems for a new language is a highly resource intensive task. There have been attempts in the past to develop techniques to circumvent the need for large amounts of data and technical expertise required to build such systems. In this paper we present the development and evaluation of an application specific speech recognizer for Hindi. We use the Salaam method [4] to bootstrap a high quality speech engine in English to develop a mobile speech based agricultural video search for farmers in India. With very little training data for a 79 word vocabulary we are able to achieve >90% accuracies for test and field deployments. We report some observations from field that we believe are critical to the effective development and usability of a speech application in ICTD.

[1]  Matthew Kam,et al.  Improving literacy in developing countries using speech recognition-supported games on mobile devices , 2012, CHI.

[2]  Mayank Dave,et al.  Implementing a Speech Recognition System Interface for Indian Languages , 2008, IJCNLP.

[3]  Ronald Rosenfeld,et al.  Small-vocabulary speech recognition for resource-scarce languages , 2010, ACM DEV '10.

[4]  Kentaro Toyama,et al.  Full-context videos for first-time, non-literate PC users , 2007, 2007 International Conference on Information and Communication Technologies and Development.

[5]  Rajesh Veeraraghavan,et al.  Digital Green: Participatory video for agricultural extension , 2007, 2007 International Conference on Information and Communication Technologies and Development.

[6]  Kuldeep Kumar HINDI SPEECH RECOGNITION SYSTEM USING HTK , 2011 .

[7]  Alexander I. Rudnicky,et al.  Speech interfaces for information access by low literate users , 2009 .

[8]  Ronald Rosenfeld,et al.  Discriminative pronunciation learning for speech recognition for resource scarce languages , 2012, ACM DEV '12.

[9]  Ronald Rosenfeld,et al.  Unexplored directions in spoken language technology for development , 2008, 2008 IEEE Spoken Language Technology Workshop.

[10]  Bijumon Varghese,et al.  The Malvi-speaking people of Madhya Pradesh and Rajasthan: a sociolinguistic profile , 2009 .

[11]  Science and technology for development: the new paradigm of ICT , 2007 .

[12]  Ronald Rosenfeld,et al.  HealthLine: Speech-based access to health information by low-literate users , 2007, 2007 International Conference on Information and Communication Technologies and Development.

[13]  Alex Waibel,et al.  The GlobalPhone Project: Multilingual LVCSR with JANUS-3 , 1997 .

[14]  Joyojeet Pal,et al.  Speech Recognition for Illiterate Access to Information and Technology , 2006, 2006 International Conference on Information and Communication Technologies and Development.

[15]  Ashish Verma,et al.  A large-vocabulary continuous speech recognition system for Hindi , 2004, IBM J. Res. Dev..

[16]  John F. Canny,et al.  Mobile-izing health workers in rural India , 2010, CHI.

[17]  Marc Uri Porat,et al.  The information economy , 1976 .

[18]  Edward Cutrell,et al.  VideoKheti: making video content accessible to low-literate and novice users , 2013, CHI.