Automatic Title Generation in Scientific Articles for Authorship Assistance: A Summarization Approach

This paper presents a study on automatic title generation for scientific articles considering sentence information types known as rhetorical categories. A title can be seen as a high-compression summary of a document. A rhetorical category is an information type conveyed by the author of a text for each textual unit, for example: background, method, or result of the research. The experiment in this study focused on extracting the research purpose and research method information for inclusion in a computer-generated title. Sentences are classified into rhetorical categories, after which these sentences are filtered using three methods. Three title candidates whose contents reflect the filtered sentences are then generated using a template-based or an adaptive K-nearest neighbor approach. The experiment was conducted using two different dataset domains: computational linguistics and chemistry. Our study obtained a 0.109-0.255 F1-measure score on average for computer-generated titles compared to original titles. In a human evaluation the automatically generated titles were deemed ‘relatively acceptable’ in the computational linguistics domain and ‘not acceptable’ in the chemistry domain. It can be concluded that rhetorical categories have unexplored potential to improve the performance of summarization tasks in general.

[1]  Ashesh Mahidadia,et al.  Extractive Summarisation Based on Keyword Profile and Language Model , 2015, NAACL.

[2]  Kam-Fai Wong,et al.  Extractive Summarization Using Supervised and Semi-Supervised Learning , 2008, COLING.

[3]  Jan Wira Gotama Putra,et al.  Title Validity Checker Utilizing Vector Space Model and Topics Model , 2015 .

[4]  Adrian Letchford,et al.  The advantage of short paper titles , 2015, Royal Society Open Science.

[5]  Lin-Shan Lee,et al.  Automatic title generation for Chinese spoken documents using an adaptive k nearest-neighbor approach , 2003, INTERSPEECH.

[6]  C. Paiva,et al.  Articles with short titles describing the results are cited more often , 2012, Clinics.

[7]  Masayu Leylia Khodra,et al.  A Multiclass-based Classification Strategy for Rethorical Sentence Categorization from Scientific Papers , 2013 .

[8]  Fabrizio Silvestri,et al.  HEADS: Headline Generation as Sequence Prediction Using an Abstract Feature-Rich Space , 2015, NAACL.

[9]  M. HamidR.Jamali,et al.  Article title type and its relation with the number of downloads and citations , 2011, Scientometrics.

[10]  Simone Teufel,et al.  Unsupervised learning of rhetorical structure with un-topic models , 2014, COLING.

[11]  Lin-Shan Lee,et al.  Automatic title generation for Chinese spoken documents with a delicate scored Viterbi algorithm , 2008, 2008 IEEE Spoken Language Technology Workshop.

[12]  Tolga Akman Selection of authors, titles and writing a manuscript abstract. , 2013, Turkish journal of urology.

[13]  Chris Fox,et al.  The Handbook of Computational Linguistics and Natural Language Processing , 2010 .

[14]  Anna Korhonen,et al.  Using Argumentative Zones for Extractive Summarization of Scientific Articles , 2012, COLING.

[15]  Marc Moens,et al.  Discourse-level argumentation in scientific articles: human and automatic annotation , 1999 .

[16]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[17]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[18]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[19]  Masayu Leylia Khodra,et al.  Rhetorical Sentence Classification for Automatic Title Generation in Scientific Article , 2017 .

[20]  Simone Teufel Towards Discipline-Independent Argumentative Zoning : Evidence from Chemistry and Computational Linguistics , 2009 .

[21]  Rong Jin,et al.  Automatic Title Generation for Spoken Broadcast News , 2001, HLT.

[22]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.