Métodos de Aprendizaje Automático aplicados a la Predicción de Palabras para Portugués de Brasil

People with physical disabilities may have serious problems to use computer keyboards to write. For this reason, they may use specific tools that include systems to assist the writing process, such us word prediction, in order to reduce the number of keystrokes needed to write the text. Word prediction may be based on different sources of information: statistical, grammatical, specific of the subject or/and the user, etc. In this paper we increase the quality of the word prediction in Brazilian Portuguese by improving the prediction of the part of speech (POS) of the predicted word. We propose the following methods to predict the POS: artificial neural networks, support vector machines, regularized logistic models and a naive Bayes classifier. When included in the word prediction system, they save between 32.55 % and 34,58 % of the keystrokes needed to write the text.

[1]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[2]  Nestor Garay-Vitoria,et al.  Intelligent word-prediction to enhance text input rate (a syntactic analysis-based word-prediction aid for people with severe motor and speech disability) , 1997, IUI '97.

[3]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4]  S. Mercy Shalinie,et al.  WORD PREDICTOR USING NATURAL LANGUAGE GRAMMAR INDUCTION TECHNIQUE , 2007 .

[5]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Harry Zhang,et al.  Naive Bayes for optimal ranking , 2008, J. Exp. Theor. Artif. Intell..

[7]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[8]  C. E. Espinosa UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL- UFRGS , 2009 .

[9]  Kiyohiro Shikano,et al.  Neural Network Approach to Word Category Prediction for English Texts , 1990, COLING.

[10]  Jean-Philippe Vert,et al.  Support Vector Machine Prediction of Signal Peptide Cleavage Site Using a New Class of Kernels for Strings , 2001, Pacific Symposium on Biocomputing.

[11]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[12]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..