Web page classification on child suitability

Children spend significant amounts of time on the Internet. Recent studies showed, that during these periods they are often not under adult supervision. This work presents an automatic approach to identifying suitable web pages for children based on topical and non-topical web page aspects. We discuss the characteristics of children's web sites with respect to recent findings in children's psychology and cognitive sciences. We finally evaluate our approach in a large-scale user study, finding, that it compares favourably to state of the art methods while approximating human performance.

[1]  Lijun Feng,et al.  Automatic readability assessment for people with intellectual disabilities , 2009, ASAC.

[2]  Evgeniy Gabrilovich,et al.  Harnessing the Expertise of 70, 000 Human Editors: Knowledge-Based Feature Generation for Text Categorization , 2007, J. Mach. Learn. Res..

[3]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[4]  Paul N. Bennett,et al.  Refined experts: improving classification in large taxonomies , 2009, SIGIR.

[5]  Timothy W. Finin,et al.  SVMs for the Blogosphere: Blog Identification and Splog Detection , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[6]  Alexandros Ntoulas,et al.  The infocious web search engine: improving web searching through linguistic analysis , 2005, WWW '05.

[7]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[8]  George R. Klare,et al.  The measurement of readability: useful information for communicators , 2000, AJCD.

[9]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[10]  Lijun Feng,et al.  Cognitively Motivated Features for Readability Assessment , 2009, EACL.

[11]  Koraljka Golub,et al.  Importance of HTML Structural Elements and Metadata in Automated Subject Classification , 2005, ECDL.

[12]  Jamshid Beheshti,et al.  Design criteria for children's Web portals: The users speak out , 2002, J. Assoc. Inf. Sci. Technol..

[13]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[14]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[15]  V. Rideout,et al.  Introduction: Electronic Media Use in the Lives of Infants, Toddlers, and Preschoolers , 2005 .

[16]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[17]  Shiva Naidu Evaluating the Usability of Educational Websites for Children , 2008 .

[18]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.