Keyword and Keyphrase Extraction using Newton's Law of Universal Gravitation

In current times, there has been a surge in the amount of collected data from computational systems. The vast amount of data can be useful in many applications and fields, particularly so in Big Data Analytics. However with a large collection of data there is a difficulty discovering important information. Automatic Document Summarization (ADS) systems are suitable for the task of outlining useful data. The ADS system model takes a text document as input, and outputs a semantically-relevant summary of this information. This information can be further separated and outlined as keywords, or keyphrases. This paper proposes a novel unsupervised approach for automatic keyword and keyphrase generation system using Newton's Law of Universal Gravitation. This approach allows for a complete capture of meaningful text, incorporating the physical structure of a document and discovered relationships between highly related words. Our model uses a new weighting method that combines both the character length of a word, and frequency of a word within a document to simulate a mass. Our model then computes the force of attraction and ranks the word-pair-force as a means of keyword and keyphrase extraction. Experimental results on several text documents demonstrated that the proposed approach improves on the state-of-the-art models.