论文信息 - Can back-of-the-book indexes be automatically created? - 字舞流文

Can back-of-the-book indexes be automatically created?

Automatic creation of back-of-the-book indexes remains one of the few manual tasks related to publishing. Inspired by how human indexers work on back-of-the-book indexes creation, we present a new domain-independent, corpus-free and training-free automation approach. Given a book, the index terms will be sequentially selected according to an indexability score encoded by the structure information residing in a book as well as a novel context-aware term informativeness measurement utilizing the power of the web knowledge base such as Wikipedia. By extensive experiments on books from various domains, we show our approach to be a more effective and practical than ones that used previous keyword extraction and supervised learning.

Zhaohui Wu | C. Lee Giles | Prasenjit Mitra | Zhenhui Li

[1] Adeline Nazarenko,et al. Building back-of-the-book indexes , 2005 .

[2] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[3] Rada Mihalcea,et al. Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing , 2008, ACL.

[4] Joelle Pineau,et al. Automatically suggesting topics for augmenting text documents , 2010, CIKM.

[5] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[6] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7] Jiawei Han,et al. Keyword extraction for social snippets , 2010, WWW '10.

[8] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[9] Zhaohui Wu,et al. Measuring Term Informativeness in Context , 2013, NAACL.

[10] Carl Gutwin,et al. KEA: practical automatic keyphrase extraction , 1999, DL '99.

[11] Yang Song,et al. Topical Keyphrase Extraction from Twitter , 2011, ACL.

[12] Zhaohui Wu,et al. Table of Contents Recognition and Extraction for Heterogeneous Book Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[13] Kirill Kireyev,et al. Semantic-based Estimation of Term Informativeness , 2009, NAACL.

[14] Rada Mihalcea,et al. Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[15] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[16] Zhiyuan Liu,et al. Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[17] Rada Mihalcea,et al. Investigations in Unsupervised Back-of-the-Book Indexing , 2007, FLAIRS.

[18] Katja Hofmann,et al. The impact of document structure on keyphrase extraction , 2009, CIKM.

[19] Virgil Diodato,et al. Back of book indexes and the characteristics of author and nonauthor indexing: Report of an exploratory study , 1991, J. Am. Soc. Inf. Sci..

[20] John Knowles. Indexing Specialities: Law , 2002, J. Documentation.

[21] Charles L. A. Clarke,et al. Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[22] Wei Wu,et al. Automatic Generation of Personalized Annotation Tags for Twitter Users , 2010, NAACL.

[23] Hinrich Schütze. The hypertext concordance: a better back-of-the-book index , 1998 .

[24] Yi-fang Brook Wu,et al. Domain-specific keyphrase extraction , 2005, CIKM '05.

[25] Ian H. Witten,et al. Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[26] Zhaohui Wu,et al. Searching online book documents and analyzing book citations , 2013, ACM Symposium on Document Engineering.