Cross-Testing a Genre Classification Model for the Web

The main aim of the experiments described in this chapter is to investigate ways of assessing the robustness and stability of an Automatic Genre Identification (AGI) model for the web. More specifically, a series of comparisons using four genre collections are illustrated and analysed. I call this comparative approach cross-testing.

[1]  Luanne Freund,et al.  Exploiting task-document relations in support of information retrieval in the workplace , 2008, SIGF.

[2]  Benno Stein,et al.  Genre classification of Web pages user study and feasibility analysis , 2004 .

[3]  Richard O. Duda,et al.  Subjective bayesian methods for rule-based inference systems , 1976, AFIPS '76.

[4]  Alistair Kennedy,et al.  Automatic Identification of Home Pages on the Web , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[5]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[6]  Serge Sharoff,et al.  Web Genre Benchmark Under Construction , 2009, J. Lang. Technol. Comput. Linguistics.

[7]  Ching-Heng Lin,et al.  GNAS: A Tool for Analyzing Performance of Gene Networks Generated from Bayesian Network Algorithms , 2007 .

[8]  Richard Power,et al.  Implementing a Characterization of Genre for Automatic Genre Identification of Web Pages , 2006, ACL.

[9]  Alexander Mehler,et al.  The Feature Difference Coefficient: Classification by Means of Feature Distributions , 2009 .

[10]  Marina Santini Zero, single, or multi? Genre of web pages through the users' perspective , 2008, Inf. Process. Manag..

[11]  Charles L. A. Clarke,et al.  Improving retrieval accuracy by weighting document types with clickthrough data , 2007, SIGIR.

[12]  Yunhyong Kim,et al.  Building a document genre corpus: a profile of the KRYS I corpus , 2008 .

[13]  Benno Stein,et al.  Retrieval Models for Genre Classification , 2008, Scand. J. Inf. Syst..

[14]  Marina Santini,et al.  Automatic identification of genre in Web pages , 2011 .

[15]  Ian Bruce,et al.  Academic Writing and Genre: A Systematic Analysis , 2008 .

[16]  Mark A. Rosso User-based identification of Web genres , 2008 .

[17]  Mike Thelwall Extracting accurate and complete results from search engines: Case study windows live , 2008 .

[18]  Carol Van Ess-Dykema,et al.  The Form is the Substance: Classification of Genres in Text , 2001, HTLKM@ACL.

[19]  Marina Santini,et al.  Automatic genre identification: towards a flexible classification scheme , 2007 .

[20]  Hang Li,et al.  Searching Documents Based on Relevance and Type , 2007, ECIR.

[21]  Marina Santini,et al.  Characterizing Genres of Web Pages: Genre Hybridism and Individualization , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[22]  Douglas Biber,et al.  Towards a taxonomy of web registers and text types: a multi-dimensional analysis , 2007 .

[23]  Charles L. A. Clarke,et al.  Towards genre classification for IR in the workplace , 2006, IIiX.

[24]  Jussi Karlgren,et al.  Assembling a Balanced Corpus from the Internet , 1998, NODALIDA.

[25]  Charles L. A. Clarke,et al.  A Bayesian Approach for Learning Document Type Relevance , 2007, ECIR.

[26]  David Y. W. Lee,et al.  Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .

[27]  Mike Thelwall Text in social networking Web sites: A word frequency analysis of Live Spaces , 2008, First Monday.

[28]  Mike Thelwall Quantitative comparisons of search engine results , 2008 .

[29]  John Gaschnig,et al.  MODEL DESIGN IN THE PROSPECTOR CONSULTANT SYSTEM FOR MINERAL EXPLORATION , 1981 .

[30]  Charles L. A. Clarke,et al.  X-Site: a workplace search tool for software engineers , 2007, SIGIR.

[31]  Manfred Görlach Text types and the history of English , 2004 .

[32]  Nicola Döring,et al.  Personal Home Pages on the Web: A Review of Research , 2006, J. Comput. Mediat. Commun..

[33]  Alexander Mehler,et al.  Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems , 2008, LREC.

[34]  N. Nilsson,et al.  Readings in Artificial Intelligence , 1981 .

[35]  Matjaz Gams,et al.  Multi-Label Approaches to Web Genre Identification , 2009, J. Lang. Technol. Comput. Linguistics.

[36]  Mark Sanderson,et al.  The SPIRIT collection: an overview of a large web collection , 2004, SIGF.

[37]  Jack Duffy,et al.  An N-gram Based Approach to Automatically Identifying Web Page Genre , 2009 .

[38]  Donald Michie,et al.  Expert systems in the micro-electronic age , 1979 .

[39]  Efstathios Stamatatos,et al.  Learning to recognize webpage genres , 2009, Inf. Process. Manag..

[40]  Marina Santini,et al.  Testing a genre-enabled application: a preliminary assessment , 2008 .

[41]  Efstathios Stamatatos,et al.  Webpage Genre Identification Using Variable-Length Character n-Grams , 2007 .

[42]  Lei Yu,et al.  Using Visual Features for Fine-Grained Genre Classification of Web Pages , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[43]  Theresa Heyd Email Hoaxes: Form, function, genre ecology , 2008 .