Zero, single, or multi? Genre of web pages through the users' perspective

The goal of the study presented in this article is to investigate to what extent the classification of a web page by a single genre matches the users' perspective. The extent of agreement on a single genre label for a web page can help understand whether there is a need for a different classification scheme that overrides the single-genre labelling. My hypothesis is that a single genre label does not account for the users' perspective. In order to test this hypothesis, I submitted a restricted number of web pages (25 web pages) to a large number of web users (135 subjects) asking them to assign only a single genre label to each of the web pages. Users could choose from a list of 21 genre labels, or select one of the two 'escape' options, i.e. 'Add a label' and 'I don't know'. The rationale was to observe the level of agreement on a single genre label per web page, and draw some conclusions about the appropriateness of limiting the assignment to only a single label when doing genre classification of web pages. Results show that users largely disagree on the label to be assigned to a web page.

[1]  Stephanie W. Haas,et al.  Page and link classifications: connecting diverse resources , 1998, DL '98.

[2]  Richard Power,et al.  Implementing a Characterization of Genre for Automatic Genre Identification of Web Pages , 2006, ACL.

[3]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[4]  Emily Gallup Fayen,et al.  Guidelines for the construction, format, and management of monolingual controlled vocabularies : A revision of ANSI/NISO Z39.19 for the 21st century , 2007 .

[5]  Stephanie W. Haas,et al.  Readers, authors, and page structure: a discussion of four questions arising from a content analysis of Web pages , 2000 .

[6]  Kevin Crowston,et al.  Genre based navigation on the Web , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[7]  John M. Swales,et al.  Genre Analysis: English in Academic and Research Settings , 1993 .

[8]  Kevin Crowston,et al.  Reproduced and Emergent Genres of Communication on the World Wide Web , 2000, Inf. Soc..

[9]  Oi Yee Kwong,et al.  Natural Language Processing - IJCNLP 2004, First International Joint Conference, Hainan Island, China, March 22-24, 2004, Revised Selected Papers , 2005, IJCNLP.

[10]  Benno Stein,et al.  Genre classification of Web pages user study and feasibility analysis , 2004 .

[11]  F. W. Lancaster,et al.  Vocabulary control for information retrieval , 1972 .

[12]  Mark Sanderson,et al.  The SPIRIT collection: an overview of a large web collection , 2004, SIGF.

[13]  C. Beghtol The Concept of Genre and Its Characteristics , 2005 .

[14]  Michael A. Shepherd,et al.  The functionality attribute of cybergenres , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[15]  Marina Santini,et al.  Interpreting Genre Evolution on the Web , 2006 .

[16]  Tero Päivärinta,et al.  On rethinking organizational document genres for electronic document management , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[17]  Gil-Chang Kim,et al.  Automatic Genre Detection of Web Documents , 2004, IJCNLP.

[18]  Carina Ihlström Eriksson,et al.  Online newspapers in Scandinavia: A longitudinal study of genre change and interdependency , 2005, Inf. Technol. People.

[19]  Diana Santos,et al.  "Yes, user!": compiling a corpus according to what the user wants , 2005 .

[20]  Thomas Erickson,et al.  Rhyme and punishment: the creation and enforcement of conventions in an on-line participatory limerick genre , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[21]  W. Orlikowski,et al.  Genres of Organizational Communication: A Structurational Approach to Studying Communication and Media , 1992 .

[22]  Marina Santini,et al.  Genres in formation? An exploratory study of web pages using cluster analysis , 2005 .

[23]  Kevin Crowston,et al.  The effects of linking on genres of Web documents , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[24]  Andrew Dillon Spatial-semantics: how users derive shape from information space , 2000 .

[25]  Günther Palm,et al.  KI 2004: Advances in Artificial Intelligence , 2004, Lecture Notes in Computer Science.

[26]  Michael A. Shepherd,et al.  The digital broadsheet: an evolving genre , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[27]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[28]  Simeon J. Yates,et al.  Digital genres and the new burden of fixity , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[29]  Catherine C. Marshall,et al.  Genre as Reflection of Technology in the World-Wide Web , 1995, IWHD.

[30]  Andrew Dillon,et al.  'It's the Journey & the Destination': shape and the emergent property of genre in evaluating digital documents , 1997, New Rev. Hypermedia Multim..

[31]  G. Held Magazine Covers – A Multimodal Pretext-Genre , 2005 .

[32]  Sven Meyer Genre Classification of Web Pages User Study and Feasibility Analysis , 2004 .

[33]  Anna Trosborg,et al.  Analysing professional genres , 2000 .

[34]  Georg Rehm Language-Independent Text Parsing of Arbitrary HTML-Documents. Towards A Foundation For Web Genre Identification , 2005, LDV Forum.

[35]  Kevin Crowston,et al.  A framework for creating a facetted classification for genres: addressing issues of multidimensionality , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[36]  Carsten S. Østerlund Combining Genres: How Practice Matters , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[37]  John C. Paolillo,et al.  Social Network and Genre Emergence in Amateur Flash Multimedia , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[38]  Jussi Karlgren,et al.  Assembling a Balanced Corpus from the Internet , 1998, NODALIDA.

[39]  Elaine Toms,et al.  Genre as interface metaphor: exploiting form and function in digital environments , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[40]  David Y. W. Lee,et al.  Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .