Question answering system supporting vector machine method for hadith domain

Retrieving accurate answers based on users’ query is the main issue of question answering systems. Challenges such as analyse the need of users’ query and extract accurate answers from large corpus are increase the difficulty of developing effective question answering system. This work aims to enhance the accuracy of question answering system for hadiths using useful methods. Pre-processing methods like tokenization and stop-word removal is used to identify the main concepts of users’ query. Answering processing methods and techniques like N-gram, WordNet, CS, and LCS are used to update and enrich the extracted concepts of users’ query based on the formal representation of hadiths answers or documents. Support Vector Machine (SVM) and Name Entity Recognition (NER) methods are conducted to classify Hadiths documents based on relevant subjects and questions types in order to reduce the searching scope of answers documents. Documents in Hadith corpus are classified according to proposed question types, and related subjects as four main classes which are: when for pray, where for pray, when for fasting, and where for fasting. The SVM classification of documents is accomplished supporting NER methods to identify the places (where) and time (when) features that included in the documents. The proposed question answering system is tested using 132 Hadiths documents about Fasting and Pray that are selected from Al-Bukhari source. The findings revealed that the average answers accuracy using CS technique is 67%, the average answers accuracy using LCS technique is 66%, the average answers accuracy using combination of CS and LCS techniques is 70%, and the average answers accuracy using CS, LCS, and SVM is 80%. SVM enhance the system accuracy up to 10% more than using other methods without classification processes. The main contribution of this research is using SVM method to reduce searching scope of Hadiths documents based on various subjects and question types beside effective analysis of query need using NLP methods. SVM provides more accurate answers than extracting answers using only similarity techniques such as CS and LCS.

[1]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Mark Stevenson,et al.  University_Of_Sheffield: Two Approaches to Semantic Text Similarity , 2012, *SEMEVAL.

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[6]  Somnath Banerjee,et al.  Question Answering System for QA4MRE@CLEF 2012 , 2012, CLEF.

[7]  Natheer K. Gharaibeh,et al.  Development of Yes/No Arabic Question Answering System , 2013, ArXiv.

[8]  Wen-Lian Hsu,et al.  Question Classification in English-Chinese Cross-Language Question Answering: An Integrated Genetic Algorithm and Machine Learning Approach , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[9]  K. Araki,et al.  Evaluation of the new feature types for question classification with support vector machines , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[10]  Zhiyong Lu,et al.  NCBI at the 2014 BioASQ Challenge Task: Large-scale Biomedical Semantic Indexing and Question Answering , 2014, CLEF.

[11]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[12]  Jianrong Cao,et al.  Algorithm of Shot Detection Based on SVM with Modified Kernel Function , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[13]  Jean-Jacques Girardot,et al.  Evaluation of question classification systems using differing features , 2009, 2009 International Conference for Internet Technology and Secured Transactions, (ICITST).

[14]  Young-In Song,et al.  A Practical QA System in Restricted Domains , 2004 .

[15]  Thierry Poibeau,et al.  Proper Name Extraction from Non-Journalistic Texts , 2000, CLIN.

[16]  Dimitrios Gunopulos,et al.  Time series similarity measures (tutorial PM-2) , 2000, KDD '00.

[17]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[18]  Wei Li,et al.  Information Extraction Supported Question Answering , 1999, TREC.

[19]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[20]  Rada Mihalcea,et al.  Text-to-Text Semantic Similarity for Automatic Short Answer Grading , 2009, EACL.

[21]  José Manuel Perea Ortega,et al.  Semantic orientation for polarity classification in Spanish reviews , 2013, Expert Syst. Appl..

[22]  Sule Gündüz Ögüdücü,et al.  A taxonomy based semantic similarity of documents using the cosine measure , 2009, 2009 24th International Symposium on Computer and Information Sciences.

[23]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[24]  Ollivier Haemmerlé,et al.  Complex correspondences for query patterns rewriting , 2013, OM.

[25]  Bo Liu,et al.  Support Vector Machines for Text Categorization in Chinese Question Classification , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[26]  King Abdullah,et al.  Knowledge Discovery in Al-Hadith Using Text Classification Algorithm , 2010 .

[27]  Farhad Soleimanian Gharehchopogh,et al.  Machine Learning based Question Classification Methods in the Question Answering Systems , 2013 .

[28]  Yun Tian,et al.  Measuring the similarity of short texts by word similarity and tree kernels , 2010, 2010 IEEE Youth Conference on Information, Computing and Telecommunications.

[29]  Fu Jian-feng,et al.  Event-network clustering using similarity , 2010, 2010 Sixth International Conference on Natural Computation.

[30]  Arun D Panicker,et al.  Question Classification using Machine Learning Approaches , 2012 .

[31]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[32]  Mostafa E. Saleh,et al.  Extraction and Visualization of the Chain of Narrators from Hadiths using Named Entity Recognition and Classification , 2014 .

[33]  Constantin Orasan,et al.  An ontology-based question answering method with the use of textual entailment , 2009, 2009 International Conference on Natural Language Processing and Knowledge Engineering.

[34]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.