An Assistive Reading System for Visually Impaired using OCR and TTS

Reading machines are mechatronic devices which use optical character recognition and text-to-speech technology in order to output synthetic voice from printed text. In this paper an assistive system has been proposed for visually impaired or blind persons. It reads textual information on papers and produces corresponding voice using OCR (Optical Character Recognition)and TTS (Text-to-speech) system. To localize text regions in images connected component labeling approach using histogram analysis is done on binarized image. TTS system using Concatenative synthesis based on SDK (Software Development Kit) platform is used. This system is operated via a voice-based user interface and also has a user friendly GUI (graphical user interface) to scan the text and to control various speech parameters. Speech signal produced can be saved and reproduced for later use.

[1]  Yalin Wang,et al.  Document zone content classification and its performance evaluation , 2006, Pattern Recognit..

[2]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[3]  Ioannis Pratikakis,et al.  A two-stage scheme for text detection in video images , 2010, Image Vis. Comput..

[4]  David Yarowsky,et al.  A corpus-based synthesizer , 1992, ICSLP.

[5]  Gui-Lin Chen,et al.  An embedded English synthesis approach based on speech concatenation and smoothing , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[6]  Justin Fackrell,et al.  Segment selection in the L&h Realspeak laboratory TTS system , 2000, INTERSPEECH.

[7]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[8]  James M. Coughlan,et al.  Figure-ground segmentation using factor graphs , 2009, Image Vis. Comput..

[9]  Sunil Kumar,et al.  Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model , 2007, IEEE Transactions on Image Processing.

[10]  Thomas S. Huang,et al.  Image processing , 1971 .

[11]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[12]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[13]  Jin Hyung Kim,et al.  Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Gérard G. Medioni,et al.  Text segmentation in color images using tensor voting , 2007, Image Vis. Comput..

[16]  Charalambos Strouthopoulos,et al.  Text identification for document image analysis using a neural network , 1998, Image Vis. Comput..

[17]  Qifeng Liu,et al.  Accurate text localization in images based on SVM output scores , 2009, Image Vis. Comput..

[18]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[19]  Eric Keller,et al.  Fundamentals of speech synthesis and speech recognition: basic concepts, state-of-the-art and future challenges , 1995 .

[20]  Robert E. Donovan,et al.  The IBM trainable speech synthesis system , 1998, ICSLP.

[21]  Wen Gao,et al.  Fast and robust text detection in images and video frames , 2005, Image Vis. Comput..

[22]  Heiga Zen,et al.  Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.

[23]  S. Nakajima,et al.  Automatic generation of synthesis units based on context oriented clustering , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[24]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[25]  Ingmund Bjørkan Speech Generation and Modification in Concatenative Speech Synthesis , 2010 .