Automatically Generated Keywords: A Comparison to Author-Generated Keywords in the Sciences

This paper examines the differences between author-generated keywords and automatically generated keywords in one area of scientific and technical literature. Using inverse frequency, keywords produced using both methods are examined using a maximum likelihood algorithm. By reducing the scope and size of the corpus of literature examined, this study more closely emulates the information gathering processes of scientists and technologists. Care was taken in developing the sample used, balancing statistical factors to allow interpretable outcomes and replication. The results of the study indicated there are no statistically significant differences between the two techniques.

[1]  H. Scheffé A METHOD FOR JUDGING ALL CONTRASTS IN THE ANALYSIS OF VARIANCE , 1953 .

[2]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[3]  Russell L. Ackoff,et al.  Scientific Method Optimizing Applied Research Decisions , 1962 .

[4]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[5]  Gordon W. Paynter,et al.  Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications , 2002, J. Assoc. Inf. Sci. Technol..

[6]  G. William Walster,et al.  Statistical Significance as a Decision Rule , 1970 .

[7]  Kenneth Ward Church,et al.  Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[8]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[9]  L. Anthony,et al.  Developing a Freeware, Multiplatform Corpus Analysis Toolkit for the Technical Writing Classroom , 2006, IEEE Transactions on Professional Communication.

[10]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[11]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[12]  Jonathan Culpeper Keyness: words, parts-of-speech and semantic categories in the character-talk of Shakespeare's "Romeo and Juliet" , 2009 .

[13]  Jirí Materna Keyness in Shakespeare's Plays , 2007, RASLAN.

[14]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[15]  Adam Kilgarriff,et al.  Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity between Corpora , 1997, VLC.

[16]  Ken Barker,et al.  Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[17]  Joel R. Levin,et al.  DETERMINING SAMPLE SIZE FOR PLANNED AND POST HOC ANALYSIS OF VARIANCE COMPARISONS1 , 1975 .

[18]  JonesSteve,et al.  Automatic extraction of document keyphrases for use in digital libraries , 2002 .

[19]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.