To each his own: personalized content selection based on text comprehensibility

Imagine a physician and a patient doing a search on antibiotic resistance. Or a chess amateur and a grandmaster conducting a search on Alekhine's Defence. Although the topic is the same, arguably the two users in each case will satisfy their information needs with very different texts. Yet today search engines mostly adopt the one-size-fits-all solution, where personalization is restricted to topical preference. We found that users do not uniformly prefer simple texts, and that the text comprehensibility level should match the user's level of preparedness. Consequently, we propose to model the comprehensibility of texts as well as the users' reading proficiency in order to better explain how different users choose content for further exploration. We also model topic-specific reading proficiency, which allows us to better explain why a physician might choose to read sophisticated medical articles yet simple descriptions of SLR cameras. We explore different ways to build user profiles, and use collaborative filtering techniques to overcome data sparsity. We conducted experiments on large-scale datasets from a major Web search engine and a community question answering forum. Our findings confirm that explicitly modeling text comprehensibility can significantly improve content ranking (search results or answers, respectively).

[1]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.

[2]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[3]  Tapas Kanungo,et al.  Predicting the readability of short web summaries , 2009, WSDM '09.

[4]  Alexander J. Smola,et al.  Maximum Margin Matrix Factorization for Collaborative Ranking , 2007 .

[5]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[6]  R. Gunning The Technique of Clear Writing. , 1968 .

[7]  G. M. McClure Readability formulas: Useful or useless? , 1987, IEEE Transactions on Professional Communication.

[8]  Ryen W. White,et al.  Personalizing web search results by reading level , 2011, CIKM '11.

[9]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[10]  Huan Liu,et al.  CubeSVD: a novel approach to personalized Web search , 2005, WWW '05.

[11]  Jasmine Novak,et al.  Building enriched document representations using aggregated anchor text , 2009, SIGIR.

[12]  Susan T. Dumais,et al.  To personalize or not to personalize: modeling queries with variation in user intent , 2008, SIGIR '08.

[13]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[14]  Fabio Crestani,et al.  Towards query log based personalization using topic models , 2010, CIKM.

[15]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[16]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[17]  Andrei Z. Broder,et al.  Exploiting site-level information to improve web search , 2010, CIKM '10.

[18]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[19]  Jaime Teevan,et al.  Information re-retrieval: repeat queries in Yahoo's logs , 2007, SIGIR.

[20]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[21]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[22]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[23]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[24]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[25]  Filip Radlinski,et al.  Personalizing web search using long term browsing history , 2011, WSDM '11.

[26]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[27]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[28]  Lada A. Adamic,et al.  Knowledge sharing and yahoo answers: everyone knows something , 2008, WWW.

[29]  Ryen W. White,et al.  Characterizing the influence of domain expertise on web search behavior , 2009, WSDM '09.

[30]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[31]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[32]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[33]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[34]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[35]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[36]  Qiang Yang,et al.  Clickthrough Log Analysis by Collaborative Ranking , 2010, Proceedings of the AAAI Conference on Artificial Intelligence.