Faceted search and browsing of audio content on spoken web

Spoken Web is a web of VoiceSites that can be accessed by a phone. The content in a VoiceSite is audio. Therefore Spoken Web provides an alternate to the World Wide Web (WWW) in developing regions where low Internet penetration and low literacy are barriers to accessing the conventional WWW. Searching of audio content in Spoken Web through an audio query-result interface presents two key challenges: indexing of audio content is not accurate, and the presentation of results in audio is sequential, and therefore cumbersome. In this paper, we apply the concepts of faceted search and browsing to the SpokenWeb search problem. We use the concepts of facets to index the meta-data associated with the audio content. We provide a mechanism to rank the facets based on the search results. We develop an interactive query interface that enables easy browsing of search results through the top ranked facets. To our knowledge, this is the first system to use the concepts of facets in audio search, and the first solution that provides an audio search for the rural population. We present quantitative results to illustrate the accuracy and effectiveness of the faceted search and qualitative results to highlight the usability of the interactive browsing system. The experiments have been conducted on more than 4000 audio documents collected from a live SpokenWeb VoiceSite and evaluations were carried out with 40 farmers who are the target users of the VoiceSite.

[1]  Abhishek Kumar,et al.  Organizational, social and operational implications in delivering ICT solutions: a telecom web case-study , 2010, ICTD 2010.

[2]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[3]  Daniel Tunkelang Dynamic Category Sets: An Approach for Faceted Search , 2006 .

[4]  Arun Kumar,et al.  Content creation and dissemination by-and-for users in rural areas , 2009, 2009 International Conference on Information and Communication Technologies and Development (ICTD).

[5]  Martin Svensson,et al.  Using contextual metadata for enhanced reusability of mobile media objects , 2009 .

[6]  Arnaud Sahuguet,et al.  An audio indexing system for election video material , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Amol Kamat,et al.  A Metadata Search Engine for Digital Language Archives , 2005, D Lib Mag..

[8]  Dipanjan Chakraborty,et al.  HSTP: hyperspeech transfer protocol , 2007, HT '07.

[9]  Benjamin D. Brunk,et al.  Toward a General Relation Browser , 2003 .

[10]  Mary Czerwinski,et al.  FaThumb: a facet-based interface for mobile search , 2006, CHI.

[11]  Alex Acero,et al.  Position Specific Posterior Lattices for Indexing Speech , 2005, ACL.

[12]  Joseph Polifroni,et al.  Crowd translator: on building localized speech recognizers through micropayments , 2010, OPSR.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Philip Barker,et al.  Blogs, Wikipedia, Second Life, and beyond: From Production to Produsage , 2009 .

[15]  Kevin Li,et al.  Faceted metadata for image search and browsing , 2003, CHI '03.

[16]  Dave Burke Voice Extensible Markup Language (VoiceXML) , 2007 .

[17]  T. Maurer,et al.  A comparison of Likert scale and traditional measures of self-efficacy. , 1998 .

[18]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[19]  Roger Burrows,et al.  Sociology and, of and in Web 2.0: Some Initial Considerations , 2007 .

[20]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[21]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[22]  Peng Yu,et al.  Vocabulary-independent indexing of spontaneous speech , 2005, IEEE Transactions on Speech and Audio Processing.

[23]  Marc Davis,et al.  Metadata creation system for mobile images , 2004, MobiSys '04.

[24]  Luis Gravano,et al.  The Stanford Digital Library metadata architecture , 1997, International Journal on Digital Libraries.

[25]  Dipanjan Chakraborty,et al.  Organizing the unorganized - employing IT to empower the under-privileged , 2008, WWW.

[26]  J. Schwarz da Silva Future internet research: The EU framework , 2007, Comput. Commun. Rev..

[27]  Debora Shaw,et al.  Handbook of usability testing: How to plan, design, and conduct effective tests , 1996 .

[28]  Dipanjan Chakraborty,et al.  VOISERV: Creation and Delivery of Converged Services through Voice for Emerging Economies , 2007, 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks.

[29]  Dipanjan Chakraborty,et al.  WWTW: the world wide telecom web , 2007, NSDR '07.

[30]  Mark H. Chignell,et al.  Searching in audio: the utility of transcripts, dichotic presentation, and time-compression , 2006, CHI.

[31]  Tapan S. Parikh,et al.  Avaaj Otalo: a field study of an interactive voice forum for small farmers in rural India , 2010, CHI.

[32]  Gilad Mishne,et al.  Automatic analysis of call-center conversations , 2005, CIKM '05.

[33]  Anne Callery,et al.  Yahoo! Cataloging the Web. , 1997 .

[34]  Hwee Tou Ng,et al.  A lattice-based approach to query-by-example spoken document retrieval , 2008, SIGIR '08.