A Centroid and Relationship based Clustering for Organizing Research Papers

Finding research papers about particular topic of study is the most time consuming activity for many people including students, professors and researchers. People doing research have to search, read and analyze multiple research papers, e-books and other documents and then determine what they contain and discover knowledge from them. Huge available resources are in the form of unstructured texts format of long text pages which require a long time to process, search, read and analyze. Organizing research papers in their respective subjects or topics can facilitate the search process. We propose a new method to research paper organization and retrieval that is amenable to closely research papers and intertwined research topics. With our centroid and relationship based clustering approach, research papers are arranged and grouped within the most probable research topics or subjects. To determine topic membership, the proposed approach considers relationships such as common terms in paper title, in keywords, in referenced titles and common terms in the top frequent sentences. To solve the high dimensional problem associated with text document, only most important information of the paper is considered and we leverage on multi-word and frequent occurring phrases as the features in clustering process. Conducted experiments show that our approach is effective.

[1]  Yiming Yang Relevance of Cluster size in MMR based Summarizer : A Report , 2002 .

[2]  Michael K. Ng,et al.  Knowledge-based vector space model for text clustering , 2010, Knowledge and Information Systems.

[3]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[4]  James Bailey,et al.  Document clustering of scientific texts using citation contexts , 2010, Information Retrieval.

[5]  Ramiz M. Aliguliyev,et al.  A new sentence similarity measure and sentence based extractive technique for automatic text summarization , 2009, Expert Syst. Appl..

[6]  Soon Myoung Chung,et al.  Text document clustering based on frequent word meaning sequences , 2008, Data Knowl. Eng..

[7]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[8]  Shivakumar Vaithyanathan,et al.  Exploiting clustering and phrases for context-based information retrieval , 1997, SIGIR '97.

[9]  Soon Myoung Chung,et al.  Text Clustering with Feature Selection by Using Statistical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[11]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[12]  Pei-ying Zhang,et al.  Automatic text summarization based on sentences clustering and extraction , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[13]  Jaeki Song,et al.  An Empirical Comparison of Four Text Mining Methods* , 2010, J. Comput. Inf. Syst..

[14]  Xijin Tang,et al.  Text classification based on multi-word with support vector machine , 2008, Knowl. Based Syst..

[15]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[16]  Fasheng Liu,et al.  Survey on text clustering algorithm -Research present situation of text clustering algorithm , 2011, ICSE 2011.

[17]  G. Aghila,et al.  Text Mining Process, Techniques and Tools : an Overview , 2010 .

[18]  Huilin Wang,et al.  Calculating Statistical Similarity between Sentences , 2011 .

[19]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[20]  Mohsen Taheriyan,et al.  Subject classification of research papers based on interrelationships analysis , 2011, KDMS '11.

[21]  Jinwoo Park,et al.  Improving text categorization using the importance of sentences , 2004, Inf. Process. Manag..

[22]  Richard Khoury,et al.  Sentence Clustering Using Parts-of-Speech , 2012 .

[23]  Atika Mustafa,et al.  Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization , 2009 .

[24]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[25]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[26]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[27]  Tamara Polajnar,et al.  Survey of Text Mining of Biomedical Corpora , 2006 .

[28]  Derek Greene,et al.  An Analysis of Current Trends in CBR Research Using Multi-View Clustering , 2010, AI Mag..

[29]  Kenrick J. Mock Hybrid Hill-Climbing and Knowledge-Based Methods for Intelligent News Filtering , 1996, AAAI/IAAI, Vol. 1.