Towards automatic assessment of government web sites

This paper presents an approach for automatic assessment of web sites in large scale e-Government surveys. The approach aims at supplementing and to some extent replacing human evaluation which is typically the core part of these surveys. The heart of the solution is a colony inspired algorithm, called the lost sheep, which automatically locates targeted governmental material online. The algorithm centers around classifying link texts to determine if a web page should be downloaded for further analysis. The proposed algorithm is designed to work with minimum human interaction and utilize the available resources as best possible. Using the lost sheep, the people carrying out a survey will only provide sample data for a few web sites for each type of material sought after. The algorithm will automatically locate the same type of material in the other web sites part of the survey. This way it significantly reduces the need for manual work in large scale e-Government surveys.

[1]  Andy Hon Wai Chun An AI Framework for the Automatic Assessment of e-Government Forms , 2008, AI Mag..

[2]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[3]  Richard Heeks,et al.  Understanding and Measuring eGovernment: International Benchmarking Studies , 2006 .

[4]  J. Millard eGovernment measurement for policy makers , 2008 .

[5]  Weimao Ke,et al.  Scalability of findability: effective and efficient IR operations in large information networks , 2010, SIGIR.

[6]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[7]  Morten Goodwin Olsen A solution to the exact match on rare item searches: introducing the lost sheep algorithm , 2011, WIMS.

[8]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[9]  Zhiguo Gong,et al.  Web structure mining: an introduction , 2005, 2005 IEEE International Conference on Information Acquisition.

[10]  Brian D. Davison Topical locality in the Web , 2000, SIGIR '00.

[11]  Philip Resnik,et al.  Mining the Web for Bilingual Text , 1999, ACL.

[12]  C. Lee Giles,et al.  A large-scale study of robots.txt , 2007, WWW '07.

[13]  Soumen Chakrabarti,et al.  Data mining for hypertext: a tutorial survey , 2000, SKDD.

[14]  Morten Goodwin,et al.  Global Web Accessibility Analysis of National Government Portals and Ministry Web Sites , 2011 .

[15]  Morten Goodwin Olsen,et al.  Benchmarking e-Government - A Comparative Review of Three International Benchmarking Studies , 2009, 2009 Third International Conference on Digital Society.

[16]  Mark Gerstein,et al.  Data Mining on the Web , 2006, Science.

[17]  B. John Oommen,et al.  Stochastic searching on the line and its applications to parameter learning in nonlinear optimization , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[18]  B. John Oommen,et al.  Optimal sampling for estimation with constrained resources using a learning automaton-based solution for the nonlinear fractional knapsack problem , 2010, Applied Intelligence.

[19]  Min-Yen Kan Web page classification without the web page , 2004, WWW Alt. '04.

[20]  Vicente Pinilla,et al.  Is E-Government Leading to More Accountable and Transparent Local Governments? An Overall View , 2010 .

[21]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[22]  Terje Gjøsæter,et al.  Architecture for large-scale automatic web accessibility evaluation based on the UWEM methodology. , 2008 .

[23]  Geert-Jan Houben,et al.  Information Retrieval in Distributed Hypertexts , 1994, RIAO.

[24]  Seyed-Hamid Zahiri,et al.  Learning automata based classifier , 2008, Pattern Recognit. Lett..

[25]  Yoelle Maarek,et al.  The Shark-Search Algorithm. An Application: Tailored Web Site Mapping , 1998, Comput. Networks.

[26]  Filippo Menczer,et al.  Mapping the semantics of Web text and links , 2005, IEEE Internet Computing.

[27]  Wanli Zuo,et al.  Focused Crawling Guided by Link Context , 2006, Artificial Intelligence and Applications.

[28]  Morten Goodwin,et al.  Towards Automated eGovernment Monitoring , 2011 .

[29]  Marc Najork,et al.  Web Crawling , 2010, Found. Trends Inf. Retr..