An investigative design of optimum stochastic language model for bangla autocomplete

Word completion and word prediction are two important phenomena in typing that have extreme effect on aiding disable people and students while using keyboard or other similar devices. Such autocomplete technique also helps students significantly during learning process through constructing proper keywords during web searching. A lot of works are conducted for English language, but for Bangla, it is still very inadequate as well as the metrics used for performance computation is not rigorous yet. Bangla is one of the mostly spoken languages (3.05% of world population) and ranked as seventh among all the languages in the world. In this paper, word prediction on Bangla sentence by using stochastic, i.e. N -gram based language models are proposed for autocomplete a sentence by predicting a set of words rather than a single word, which was done in previous work. A novel approach is proposed in order to find the optimum language model based on performance metric. In addition, for finding out better performance, a large Bangla corpus of different word types is used.

[1]  Tobias Scheffer,et al.  Sentence Completion , 1921, SIGIR '04.

[2]  Qing Wu,et al.  Confabulation based sentence completion for machine reading , 2011, 2011 IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB).

[3]  Md. Mokhlesur Rahman,et al.  Automated Word Prediction in Bangla Language Using Stochastic Language Models , 2016, ArXiv.

[4]  Qaiser Abbas,et al.  A Stochastic Prediction Interface for Urdu , 2014 .

[5]  Christopher J. C. Burges,et al.  The Microsoft Research Sentence Completion Challenge , 2011 .

[6]  Md. Sadekur Rahman,et al.  An Exploratory Approach to Find a Novel Metric Based Optimum Language Model for Automatic Bangla Word Prediction , 2018 .

[7]  Kannan Kumar,et al.  Sentimental Analysis of Twitter Data using Classifier Algorithms , 2016 .

[8]  M. Hanumanthappa,et al.  N-gram Word prediction language models to identify the sequence of article blocks in English e-newspapers , 2016, 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS).

[9]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[10]  Prasenjit Mitra,et al.  Query suggestions in the absence of query logs , 2011, SIGIR.

[11]  Md. Habibur Rahman,et al.  Verification of Bangla Sentence Structure using N-Gram , 2014 .

[12]  V Ashwin Twitter Tweet Classifier , 2016 .

[13]  Gonesh Chandra Saha,et al.  Checking the Correctness of Bangla Words using N-Gram , 2014 .

[14]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[15]  Hisham Al-Mubaid,et al.  A Learning-Classification Based Approach for Word Prediction , 2007, Int. Arab J. Inf. Technol..

[16]  Y. Prasanth,et al.  A Decision System for Predicting Diabetes using Neural Networks , 2017 .

[17]  Naushad UzZaman,et al.  N-gram based statistical grammar checker for Bangla and English , 2007 .

[18]  R. Stephenson A and V , 1962, The British journal of ophthalmology.