Mining Domain Similarity to Enhance Digital Indexing

Indexing research articles in scientific publications can be arduous. The authors tag their articles by the topics or domains relevant to their research. A publication's organizers may tag them by the broad topics of the specific publication. A third-party may index or tag these articles based on their subject knowledge. Hence indexing of articles can be uneven due to inconsistencies in area knowledge by third-parties or the niche topic representation by the authors. Publications may have schemes in place for indexing or tagging the articles but such schemes cannot keep up with the continuously changing landscape of research. These schemes may need to be updated with newer topics or domains being churned out by the state of the art research. Our technique endeavors to address this problem. We present a methodology to find similarity among domains extracted from the content of research papers, and cluster related domains. Analysis of these clusters provides insights into how the existing indexing schemes may be enhanced by adding newer domains.

[1]  Rajeev Agrawal,et al.  Towards Extracting Domains from Research Publications , 2015, MAICS.

[2]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[3]  Fausto Giunchiglia,et al.  Element level semantic matching using WordNet , 2006 .

[4]  Johanna Völker,et al.  Ontology Learning and Reasoning - Dealing with Uncertainty and Inconsistency , 2005, ISWC-URSW.

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Günes Erkan,et al.  Language Model-Based Document Clustering Using Random Walks , 2006, NAACL.

[7]  Rajeev Agrawal,et al.  Discover trending domains using fusion of supervised machine learning with natural language processing , 2015, 2015 18th International Conference on Information Fusion (Fusion).

[8]  Giacomo Berardi,et al.  Utility-Theoretic Ranking for Semiautomated Text Classification , 2015, ERCIM News.

[9]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[10]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[11]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[12]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[13]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[14]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[15]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Anne Kao,et al.  Natural Language Processing and Text Mining , 2006 .