Personalizing web search results by reading level

Traditionally, search engines have ignored the reading difficulty of documents and the reading proficiency of users in computing a document ranking. This is one reason why Web search engines do a poor job of serving an important segment of the population: children. While there are many important problems in interface design, content filtering, and results presentation related to addressing children's search needs, perhaps the most fundamental challenge is simply that of providing relevant results at the right level of reading difficulty. At the opposite end of the proficiency spectrum, it may also be valuable for technical users to find more advanced material or to filter out material at lower levels of difficulty, such as tutorials and introductory texts. We show how reading level can provide a valuable new relevance signal for both general and personalized Web search. We describe models and algorithms to address the three key problems in improving relevance for search using reading difficulty: estimating user proficiency, estimating result difficulty, and re-ranking based on the difference between user and result reading level profiles. We evaluate our methods on a large volume of Web query traffic and provide a large-scale log analysis that highlights the importance of finding results at an appropriate reading level for the user.

[1]  Dania Bilal,et al.  Children's use of the Yahooligans! Web search engine: I. Cognitive, physical, and affective behaviors on fact-based search tasks , 2000, J. Am. Soc. Inf. Sci..

[2]  Djoerd Hiemstra,et al.  Automatic Reformulation of Children's Search Queries , 2010 .

[3]  Kuansan Wang,et al.  PSkip: estimating relevance ranking quality from web search clickthrough data , 2009, KDD.

[4]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[5]  Wei Yuan,et al.  Smoothing clickthrough data for web search ranking , 2009, SIGIR.

[6]  Ryen W. White,et al.  Characterizing and predicting search engine switching behavior , 2009, CIKM.

[7]  Djoerd Hiemstra,et al.  An analysis of queries intended to search information for children , 2010, IIiX.

[8]  Kevyn Collins-Thompson,et al.  Statistical Estimation of Word Acquisition with Application to Readability Prediction , 2009, EMNLP.

[9]  Ryen W. White,et al.  Characterizing the influence of domain expertise on web search behavior , 2009, WSDM '09.

[10]  Dania Bilal Children's use of the Yahooligans! Web search engine: I. Cognitive, physical, and affective behaviors on fact‐based search tasks , 2000 .

[11]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[12]  Marie-Francine Moens,et al.  Wisdom of the ages: toward delivering the children's web with the link-based agerank algorithm , 2010, CIKM.

[13]  Ryen W. White,et al.  Predicting short-term interests using activity-based search context , 2010, CIKM.

[14]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[15]  Omid Madani,et al.  Biasing web search results for topic familiarity , 2005, CIKM '05.

[16]  W. Bruce Croft,et al.  Automatic recognition of reading levels from user queries , 2004, SIGIR '04.

[17]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[18]  Arjen P. de Vries,et al.  A combined topical/non-topical approach to identifying web sites for children , 2011, WSDM '11.

[19]  Charles L. A. Clarke,et al.  The influence of caption features on clickthrough patterns in web search , 2007, SIGIR.

[20]  Sandra G. Hirsh Children's Relevance Criteria and Information Seeking on Electronic Resources , 1999, J. Am. Soc. Inf. Sci..

[21]  Elizabeth Foss,et al.  Children's roles using keyword search interfaces at home , 2010, CHI.

[22]  J. Chall,et al.  Readability revisited : the new Dale-Chall readability formula , 1995 .

[23]  Susan T. Dumais,et al.  Classification-enhanced ranking , 2010, WWW '10.

[24]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .