Estimating web site readability using content extraction

Nowadays, information is primarily searched on the WWW. From a user perspective, the readability is an important criterion for measuring the accessibility and thereby the quality of an information. We show that modern content extraction algorithms help to estimate the readability of a web document quite accurate.

[1]  Holger M. Kienle,et al.  Evolution of legal statements on the web , 2008, 2008 10th International Symposium on Web Site Evolution.

[2]  Wei Li,et al.  QuASM: a system for question answering using semi-structured data , 2002, JCDL '02.

[3]  Irwin King,et al.  Bilingual web page and site readability assessment , 2006, WWW '06.

[4]  Barry Smyth,et al.  Fact or Fiction: Content Classification for Digital Libraries , 2001, DELOS.

[5]  Thomas Gottron,et al.  Content Code Blurring: A New Approach to Content Extraction , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[6]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[7]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .