VOICECONET: A Collaborative Framework for Speech-Based Computer Accessibility with a Case Study for Brazilian Portuguese

In recent years, the performance of personal computers has evolved with the production of ever faster processors, a fact that enables the adoption of speech processing in computer-assisted education. There are several speech technologies that are effective in education, among which text-to-speech (TTS) and automatic speech recognition (ASR) are the most prominent. TTS systems [45] are software modules that convert natural language text into synthesized speech. ASR [18] can be seen as the TTS inverse process, in which the digitized speech signal, captured for example via a microphone, is converted into text.

[1]  Jorge Proença,et al.  Computational Processing of the Portuguese Language , 2014, Lecture Notes in Computer Science.

[2]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[3]  S. Ramakrishnan Modern Speech Recognition Approaches with Case Studies , 2012 .

[4]  Geir Gunnarsson Data Driven Methods in Speech Synthesis , 2005 .

[5]  Patrick Silva,et al.  An Open-Source Speech Recognizer for Brazilian Portuguese with a Windows Programming Interface , 2010, PROPOR.

[6]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[7]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Tzu-Hua Wang,et al.  Web-based dynamic assessment: Taking assessment as teaching and learning strategy for improving students' e-Learning effectiveness , 2010, Comput. Educ..

[10]  António J. S. Teixeira,et al.  On the Use of Machine Learning and Syllable Information in European Portuguese Grapheme-Phone Conversion , 2006, PROPOR.

[11]  Gheorghe Sabau,et al.  Collaborative Network for the Development of an Informational System in the SOA Context for the University Management , 2009, 2009 International Conference on Computer Technology and Development.

[12]  Giuliano Antoniol,et al.  Radiological Reporting Based on Voice Recognition , 1993, EWHCI.

[13]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[15]  A. A. Fidalgo-Neto,et al.  The use of computers in Brazilian primary and secondary schools , 2009, Comput. Educ..

[16]  Michael F. McTear,et al.  Speech recognition in the secondary school classroom: an exploratory study , 1999, Comput. Educ..

[17]  Oscar Saz-Torralba,et al.  Tools and Technologies for Computer-Aided Speech and Language Therapy , 2009, Speech Commun..

[18]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[19]  Aldebaro Klautau,et al.  A Computer-assisted Learning Software Using Speech Synthesis and Recognition in Brazilian Portuguese , 2009 .

[20]  András Kornai Extended finite state models of language , 1996, Nat. Lang. Eng..

[21]  D.C. Silva,et al.  A rule-based grapheme-phone converter and stress determination for Brazilian Portuguese natural language processing , 2006, 2006 International Telecommunications Symposium.

[22]  Jane Seale,et al.  E-learning and accessibility: An exploration of the potential role of generic pedagogical tools , 2010, Comput. Educ..

[23]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[24]  Robert Agranoff,et al.  Inside Collaborative Networks: Ten Lessons for Public Managers , 2006 .

[25]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[26]  Heiga Zen,et al.  An HMM-Based Brazilian Portuguese Speech Synthesizer and Its Characteristics DOI: 10.14209/jcis.2006.11 , 2015 .

[27]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[28]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[29]  Isabel Trancoso,et al.  Grapheme-to-phone using finite-state transducers , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[30]  Информатика,et al.  Microsoft Speech API , 2010 .

[31]  Maria Helena Mira Mateus Introdução a estudos de fonologia do português brasileiro , 2000 .

[32]  N. Deshmukh,et al.  Hierarchical search for large-vocabulary conversational speech recognition: working toward a solution to the decoding problem , 1999 .

[33]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[34]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[35]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[36]  Dante Barone,et al.  A brazilian portuguese language corpus development , 2000, INTERSPEECH.

[37]  Jelena Kovacevic,et al.  Reproducible research in signal processing , 2009, IEEE Signal Process. Mag..