A SEMANTIC APPROACH TO ANALYZE SCIENTIFIC PAPER ABSTRACTS

Each domain and its underlying communities evolve in time and each period is centered on specific topics that emerge from textual sources that characterize the domain. Our analysis represents an extension of other researches performed on the same corpora that were focusing more on evaluating co-citations between the articles in order to compute their importance score (Grauwin and Jensen (1)). Our approach presents a general perspective of the domain by performing semantic comparisons between article abstracts using natural language processing techniques such as Latent Semantic Analysis, Latent Dirichlet Allocation or semantic distances in lexicalized ontologies, i.e. WordNet. Moreover, graph visual representations are generated using Gephi in order to highlight the keywords of each paper and of the domain, the document similarity view or the table of keyword-abstract overlap score. The purpose of the views is to minimize the learning curve of the domain and to facilitate the research process for someone interested in a particular subject. Also, in order to further argue the benefits of our approach, some potential refinements of the methods for classification that can be performed as future improvements are presented.

[1]  Sanford G. Thatcher Stylish Academic Writing , 2014, Learn. Publ..

[2]  Arthur C. Graesser,et al.  Artificial Intelligence in Education: Building Learning Systems that Care: From Knowledge Representation to Affective Modelling, Proceedings of the 14th International Conference on Artificial Intelligence in Education, AIED 2009, July 6-10, 2009, Brighton, UK , 2009, AIED.

[3]  Guo Zhang,et al.  Content‐based citation analysis: The next generation of citation analysis , 2014, J. Assoc. Inf. Sci. Technol..

[4]  Mihai Dascalu,et al.  Analyzing Discourse and Text Complexity for Learning and Collaborating - A Cognitive Approach Based on Natural Language Processing , 2013, Studies in Computational Intelligence.

[5]  Kevin W. Boyack,et al.  Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[8]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[9]  Stefan Trausan-Matu,et al.  Analyzing Emotional States Induced by News Articles with Latent Semantic Analysis , 2012, AIMSA.

[10]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[11]  Dietmar Wolfram,et al.  Measuring Scholarly Impact: Methods and Practice , 2014 .

[12]  Arthur C. Graesser,et al.  Artificial Intelligence in Education - Building Learning Systems that Care: From Knowledge Representation to Affective Modelling, Volume 200 Frontiers in Artificial Intelligence and Applications , 2009 .

[13]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[14]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[15]  Stefan Trausan-Matu,et al.  Mining Texts, Learner Productions and Strategies with ReaderBench , 2014 .

[16]  Rafael A. Calvo,et al.  Analysing Semantic Flow in Academic Writing , 2009, AIED.

[17]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[18]  Alejandro Pea-Ayala,et al.  Educational Data Mining: Applications and Trends , 2013 .

[19]  Stefan Trausan-Matu,et al.  Validating the Automated Assessment of Participation and of Collaboration in Chat Conversations , 2014, Intelligent Tutoring Systems.