Improving educational web search for question-like queries through subject classification

Abstract Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.

[1]  FigueroaAlejandro,et al.  Context-aware semantic classification of search queries for browsing community question-answering archives , 2016 .

[2]  Chin-Chung Tsai,et al.  Eighth graders' web searching strategies and outcomes: The role of task types, web experiences and epistemological beliefs , 2008, Comput. Educ..

[3]  Lei Yu,et al.  Question Quality Analysis and Prediction in Community Question Answering Services with Coupled Mutual Reinforcement , 2017, IEEE Transactions on Services Computing.

[4]  Marco Gori,et al.  Web page scoring systems for horizontal and vertical search , 2002, WWW.

[5]  Günter Neumann,et al.  Context-aware semantic classification of search queries for browsing community question-answering archives , 2016, Knowl. Based Syst..

[6]  Kemal Oflazer,et al.  Dependency Parsing of Turkish , 2008, CL.

[7]  Long Chen,et al.  Understanding and exploiting user intent in community question answering , 2014 .

[8]  W. Bruce Croft,et al.  Analysis of Statistical Question Classification for Fact-Based Questions , 2005, Information Retrieval.

[9]  Benjamin S. Bloom,et al.  Taxonomy of Educational Objectives: The Classification of Educational Goals. , 1957 .

[10]  Arif Usta,et al.  Optimization of an educational search engine using learning to rank algorithms , 2015 .

[11]  Özgür Ulusoy,et al.  How k-12 students search for learning?: analysis of an educational search engine log , 2014, SIGIR.

[12]  Joseph A. Konstan,et al.  Expert identification in community question answering: exploring question selection bias , 2010, CIKM '10.

[13]  Alessandro Bozzon,et al.  Sparrows and Owls: Characterisation of Expert Behaviour in StackOverflow , 2014, UMAP.

[14]  Association Information , 2000 .

[15]  Gülsen Eryigit,et al.  ITU Turkish NLP Web Service , 2014, EACL.

[16]  Yvonne Kammerer,et al.  Children's web search with Google: the effectiveness of natural language queries , 2012, IDC '12.

[17]  Jin Mao,et al.  Social media for learning: A mixed methods study on high school students' technology affordances and perspectives , 2014, Comput. Hum. Behav..

[18]  Megha Mishra,et al.  Question Classification using Semantic, Syntactic and Lexical features , 2013 .

[19]  F. Maxwell Harper,et al.  Facts or friends?: distinguishing informational and conversational questions in social Q&A sites , 2009, CHI.

[20]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[21]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[22]  F. Maxwell Harper,et al.  Question types in social Q&A sites , 2010, First Monday.

[23]  Sanghee Oh,et al.  Social Q&A , 2018, Social Information Access.

[24]  Hideki Kashioka,et al.  Leveraging social Q&A collections for improving complex question answering , 2015, Comput. Speech Lang..

[25]  Erik Duval,et al.  Context-Aware Recommender Systems for Learning: A Survey and Future Challenges , 2012, IEEE Transactions on Learning Technologies.

[26]  Gregory K. W. K. Chung,et al.  Children's Internet Searching on Complex Problems: Performance and Process Analyses , 1998, J. Am. Soc. Inf. Sci..

[27]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[28]  Dania Bilal,et al.  Children's use of the Yahooligans! Web search engine. III. Cognitive and physical behaviors on fully self-generated search tasks , 2002, J. Assoc. Inf. Sci. Technol..

[29]  Eugene Agichtein,et al.  Finding the right facts in the crowd: factoid question answering over social media , 2008, WWW.

[30]  Dan Roth,et al.  Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[31]  Norazah Yusof,et al.  Determination of Bloom's cognitive level of question items using artificial neural network , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[32]  Anwar Ali Yahya,et al.  Automatic Classification of Questions into Bloom's Cognitive Levels Using Support Vector Machines , 2011 .

[33]  Alejandro Figueroa,et al.  Why was this asked? Automatically recognizing multiple motivations behind community question-answering questions , 2017, Expert Syst. Appl..

[34]  Michael Chau,et al.  Comparison of Three Vertical Search Spiders , 2003, Computer.

[35]  N. Omar,et al.  A rule-based approach in Bloom's Taxonomy question classification through natural language processing , 2012, 2012 7th International Conference on Computing and Convergence Technology (ICCCT).

[36]  Fazli Can,et al.  Information retrieval on Turkish texts , 2008, J. Assoc. Inf. Sci. Technol..

[37]  Xingyuan Wang,et al.  Approximating web communities using subspace decomposition , 2014, Knowl. Based Syst..

[38]  BilalDania Children's use of the Yahooligans! Web search engine , 2001 .

[39]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[40]  Christian S. Jensen,et al.  The use of categorization information in language models for question retrieval , 2009, CIKM.

[41]  Kemal Oflazer Turkish and its challenges for language processing , 2014, Lang. Resour. Evaluation.

[42]  Lee Rainie,et al.  How teens do research in the digital world , 2012 .

[43]  Arpita Ghosh,et al.  Incentivizing participation in online forums for education , 2013, EC '13.

[44]  Erik Duval,et al.  Relevance Ranking Metrics for Learning Objects , 2007, IEEE Transactions on Learning Technologies.

[45]  Babak Loni Enhanced Question Classification with Optimal Combination of Features , 2011 .

[46]  Kemal Oflazer,et al.  Building a wordnet for Turkish , 2004 .

[47]  Eric Brill,et al.  Automatic Question Answering: Beyond the Factoid , 2004, NAACL.

[48]  Luísa Coheur,et al.  From symbolic to sub-symbolic information in question classification , 2011, Artificial Intelligence Review.

[49]  Leif Azzopardi,et al.  Theory of Retrieval: The Retrievability of Information , 2015, ICTIR.

[50]  Jacob Aristotle,et al.  Stack Overflow , 2012 .

[51]  Zengchang Qin,et al.  Question Classification using Head Words and their Hypernyms , 2008, EMNLP.

[52]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[53]  Manoranjitham Muniandy,et al.  QUESTION CLASSIFICATION USING STATISTICAL APPROACH: A COMPLETE REVIEW , 2015 .

[54]  Sheizaf Rafaeli,et al.  Predictors of answer quality in online Q&A sites , 2008, CHI.

[55]  Michael R. Lyu,et al.  Analyzing and predicting question quality in community question answering services , 2012, WWW.

[56]  Kevyn Collins-Thompson,et al.  Towards searching as a learning process: A review of current perspectives and future directions , 2016, J. Inf. Sci..

[57]  Alejandro Figueroa,et al.  Leveraging linguistic traits and semi-supervised learning to single out informational content across how-to community question-answering archives , 2017, Inf. Sci..

[58]  Iryna Gurevych,et al.  Educational Question Answering based on Social Media Content , 2009, AIED.

[59]  Babak Loni,et al.  A Survey of State-of-the-Art Methods on Question Classification , 2011 .

[60]  Ulf Hermjakob,et al.  Parsing and Question Classification for Question Answering , 2001, ACL 2001.

[61]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Jihie Kim,et al.  Intelligent Support for Learning in Groups , 2013 .