A Hybrid Approach Towards Two Stage Bengali Question Classification Utilizing Smart Data Balancing Technique

Question classification (QC) is the primary step of the Question Answering (QA) system. Question Classification (QC) system classifies the questions in particular classes so that Question Answering (QA) System can provide correct answers for the questions. Our system categorizes the factoid type questions asked in natural language after extracting features of the questions. We present a two stage QC system for Bengali. It utilizes one dimensional convolutional neural network for classifying questions into coarse classes in the first stage. Word2vec representation of existing words of the question corpus have been constructed and used for assisting 1D CNN. A smart data balancing technique has been employed for giving data hungry convolutional neural network the advantage of a greater number of effective samples to learn from. For each coarse class, a separate Stochastic Gradient Descent (SGD) based classifier has been used in order to differentiate among the finer classes within that coarse class. TF-IDF representation of each word has been used as feature for the SGD classifiers implemented as part of second stage classification. Experiments show the effectiveness of our proposed method for Bengali question classification.

[1]  P. Kumar,et al.  A Hindi Question Answering system for E-learning documents , 2005, 2005 3rd International Conference on Intelligent Sensing and Information Processing.

[2]  Miguel Ángel García Cumbreras,et al.  BRUJA: question classification for Spanish. Using machine translation and an English classifier , 2006 .

[3]  Wen-Lian Hsu,et al.  Question Classification in English-Chinese Cross-Language Question Answering: An Integrated Genetic Algorithm and Machine Learning Approach , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[4]  Jun Chen,et al.  Semi-supervised learning for question classification in CQA , 2016, Natural Computing.

[5]  Asma Aouichat,et al.  Arabic Question Classification Using Support Vector Machines and Convolutional Neural Networks , 2018, NLDB.

[6]  Zengchang Qin,et al.  Question Classification using Head Words and their Hypernyms , 2008, EMNLP.

[7]  Rong Chen,et al.  Feature extraction based on information gain and sequential pattern for English question classification , 2018, IET Softw..

[8]  Salekul Islam,et al.  A Comprehensive Comparison of Machine Learning Based Methods Used in Bengali Question Classification , 2019, 2019 IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON).

[9]  Wen Xu Modified Bayesian Model Based Question Classification , 2005 .

[10]  Somnath Banerjee,et al.  An Empirical Study of Combing Multiple Models in Bengali Question Classification , 2013, IJCNLP.

[11]  Farah Benamara Cooperative Question Answering in Restricted Domains: the WEBCOOP Experiment , 2004 .

[12]  Zhiping Zheng,et al.  AnswerBus question answering system , 2002 .

[13]  Timothy Baldwin,et al.  What to classify and how: Experiments in question classification for Japanese , 2007 .

[14]  Somnath Banerjee,et al.  Ensemble Approach for Fine-Grained Question Classification in Bengali , 2013 .

[15]  Ralph Grishman,et al.  Hindi-english cross-lingual question-answering system , 2003, TALIP.

[16]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[17]  Dragomir R. Radev,et al.  The Use of Predictive Annotation for Question Answering in TREC8 , 1999, TREC.

[18]  F. A. Mohammed,et al.  A knowledge based Arabic question answering system (AQAS) , 1993, SGAR.

[19]  Noha S. Fareed,et al.  Syntactic open domain Arabic question/answering system for factoid questions , 2014, 2014 9th International Conference on Informatics and Systems.

[20]  Mohammad Nurul Huda,et al.  Word/phrase based answer type classification for Bengali question answering system , 2016, 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV).

[21]  Bert F. Green,et al.  Baseball: an automatic question-answerer , 1899, IRE-AIEE-ACM '61 (Western).

[22]  Alaa Mohasseb,et al.  Question categorization and classification using grammar based approach , 2018, Inf. Process. Manag..

[23]  Taha H. Rassem,et al.  Combined Support Vector Machine and Pattern Matching for Arabic Islamic Hadith Question Classification System , 2018, Advances in Intelligent Systems and Computing.

[24]  Md. Saiful Islam,et al.  Question classification using support vector machine with hybrid feature extraction method , 2017, 2017 20th International Conference of Computer and Information Technology (ICCIT).

[25]  Eduard H. Hovy,et al.  Question Answering in Webclopedia , 2000, TREC.

[26]  Wilson Wong,et al.  Practical Approach to Knowledge-based Question Answering with Natural Language Understanding and Advanced Reasoning , 2007, ArXiv.

[27]  Somnath Banerjee,et al.  Bengali Question Classification: Towards Developing QA System , 2012, WSSANLP@COLING.

[28]  Dan Roth,et al.  Semantic Integration in Text: From Ambiguous Names to Identifiable Entities , 2005, AI Mag..

[29]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[30]  Ma Jin-shan Syntactic Structure Parsing Based Chinese Question Classification , 2006 .

[31]  Mark Lee,et al.  High Accuracy Rule-based Question Classification using Question Syntax and Semantics , 2016, COLING.