An introduction to voice search

Voice search is the technology underlying many spoken dialog systems (SDSs) that provide users with the information they request with a spoken query. The information normally exists in a large database, and the query has to be compared with a field in the database to obtain the relevant information. The contents of the field, such as business or product names, are often unstructured text. This article categorized spoken dialog technology into form filling, call routing, and voice search, and reviewed the voice search technology. The categorization was made from the technological perspective. It is important to note that a single SDS may apply the technology from multiple categories. Robustness is the central issue in voice search. The technology in acoustic modeling aims at improved robustness to environment noise, different channel conditions, and speaker variance; the pronunciation research addresses the problem of unseen word pronunciation and pronunciation variance; the language model research focuses on linguistic variance; the studies in search give rise to improved robustness to linguistic variance and ASR errors; the dialog management research enables graceful recovery from confusions and understanding errors; and the learning in the feedback loop speeds up system tuning for more robust performance. While tremendous achievements have been accomplished in the past decade on voice search, large challenges remain. Many voice search dialog systems have automation rates around or below 50% in field trials.

[1]  Matthew Lennig,et al.  Directory assistance automation in Bell Canada: Trial results , 1995, Speech Commun..

[2]  Geoffrey Zweig,et al.  Voice-Rate: A Dialog System for Consumer Ratings , 2007, HLT-NAACL.

[3]  Alex Acero,et al.  Spoken Language Understanding "” An Introduction to the Statistical Framework , 2005 .

[4]  Bhuvana Ramabhadran,et al.  Innovative approaches for large vocabulary name recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Rohit Kumar,et al.  Conquestâ - An Open-Source Dialog System for Conferences , 2007, HLT-NAACL.

[6]  Siddharth Bhatia,et al.  MS connect: a fully featured auto-attendant: system design, implementation and performance , 2004, INTERSPEECH.

[7]  Geoffrey Zweig,et al.  Automatic construction of unique signatures and confusable sets for natural language directory assistance applications , 2003, INTERSPEECH.

[8]  Shrikanth S. Narayanan,et al.  VPQ: a spoken language interface to large scale directory information , 1998, ICSLP.

[9]  Xiao Li,et al.  Unsupervised semantic intent discovery from call log acoustics , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Geoffrey Zweig,et al.  Confidence measures for voice search applications , 2007, INTERSPEECH.

[11]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[12]  Joseph Polifroni,et al.  AN ANALYSIS OF AUTOMATIC CONTENT SELECTION ALGORITHMS FOR SPOKEN DIALOGUE SYSTEM SUMMARIES , 2006, 2006 IEEE Spoken Language Technology Workshop.

[13]  F. Canavesio,et al.  Automation of Telecom Italia directory assistance service: field trial results , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[14]  Bhuvana Ramabhadran,et al.  Acoustics-only based automatic phonetic baseform generation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Marilyn A. Walker,et al.  The role of speech processing in human-computer intelligent communication , 1997, Speech Commun..

[16]  Dilek Z. Hakkani-Tür,et al.  LET'S DISCOH: COLLECTING AN ANNOTATED OPEN CORPUSWITH DIALOGUE ACTS AND REWARD SIGNALS FOR NATURAL LANGUAGE HELPDESKS , 2006, 2006 IEEE Spoken Language Technology Workshop.

[17]  J.G. Wilpon,et al.  Intelligent virtual agents for contact center automation , 2005, IEEE Signal Processing Magazine.

[18]  Candace A. Kamm,et al.  Speech recognition issues for directory assistance applications , 1995, Speech Commun..

[19]  Frédéric Béchet,et al.  Dynamic generation of proper name pronunciations for directory assistance , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Pietro Laface,et al.  Learning new user formulations in automatic Directory Assistance , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Ute Ehrlich,et al.  How to access audio files of large data bases using in-car speech dialogue systems , 2007, INTERSPEECH.

[22]  David Nahamoo Speech Technology Opportunities and Challenges , 2006, SLT.

[23]  Geoffrey Zweig,et al.  Live search for mobile:Web services by voice on the cellphone , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Geoffrey Zweig,et al.  Automated directory assistance system - from theory to practice , 2007, INTERSPEECH.

[25]  Hauke Schramm,et al.  Strategies for name recognition in automatic directory assistance systems , 2000, Speech Commun..

[26]  Roberto Pieraccini,et al.  Where do we go from here? Research and Commercial Spoken Dialog Systems , 2005, SIGDIAL.

[27]  Frédéric Béchet,et al.  Introduction to the IST-HLT project speech-driven multimodal automatic directory assistance (SMADA) , 2000, INTERSPEECH.

[28]  Ye-Yi Wang,et al.  Spoken language understanding , 2005, IEEE Signal Processing Magazine.

[29]  Richard M. Schwartz,et al.  A scalable architecture for Directory Assistance automation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Dong Yu,et al.  N-Gram Based Filler Model for Robust Grammar Authoring , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.