Formulating Representative Features with Respect to Genre Classification

Document classification is one of the most fundamental steps in enabling the search, selection, and ranking of digital material according to its relevance in answering a predefined search. As such it is a valuable means of knowledge discovery and an essential part of the effective and efficient management of digital documents in a repository, library, or archive.

[1]  Yunhyong Kim,et al.  Building a document genre corpus: a profile of the KRYS I corpus , 2008 .

[2]  Marina Santini,et al.  Automatic identification of genre in Web pages , 2011 .

[3]  Seamus Ross,et al.  Preservation research and sustainable digital libraries , 2005, International Journal on Digital Libraries.

[4]  Jihoon Yang,et al.  Knowledge-based metadata extraction from PostScript files , 2000, DL '00.

[5]  Sébastien Adam,et al.  Clustering document images using a bag of symbols representation , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[6]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[7]  Yunhyong Kim,et al.  Detecting Family Resemblance: Automated Genre Classification , 2007, Data Sci. J..

[8]  Ching-Heng Lin,et al.  GNAS: A Tool for Analyzing Performance of Gene Networks Generated from Bayesian Network Algorithms , 2007 .

[9]  Marcel Worring,et al.  Fine-grained document genre classification using first order random graphs , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  Andreas Rauber,et al.  Integrating automatic genre analysis into digital libraries , 2001, JCDL '01.

[12]  Shmuel T. Klein,et al.  Clumping properties of content-bearing words , 1998 .

[13]  Yiming Yang,et al.  A scalability analysis of classifiers in text categorization , 2003, SIGIR.

[14]  Lei Dong,et al.  An Examination of Genre Attributes for Web Page Classification , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[15]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[16]  Efstathios Stamatatos,et al.  Webpage Genre Identification Using Variable-Length Character n-Grams , 2007 .

[17]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[18]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[19]  Geoffrey Leech,et al.  Grammatical word class variation within the British National Corpus sampler , 2002 .

[20]  Aidan Finn,et al.  Learning to classify documents according to genre , 2006, J. Assoc. Inf. Sci. Technol..

[21]  Chris Bowerman,et al.  PERC: A Personal Email Classifier , 2006, ECIR.

[22]  Paul H. Garthwaite,et al.  Frequent Term Distribution Measures for Dataset Profiling , 2004, LREC.

[23]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .