Topic Summary Views for Exploration of Large Scholarly Datasets

In this article, we present the E-sch approach for exploration of large scholarly datasets based on topic summary views. The goal of E-sch is to semantically summarize the dataset related to a potentially very large number of scholar publications (e.g., millions) by a list of few thousands topics, up to an ultimate list of hundreds of topic summaries to use for analyzing research dynamics and evolution at a more semantic, high-level of inquiry. Filter and Slice operators are defined in E-sch to enforce interactive scholarly data exploration along thematic and temporal perspectives.

[1]  Andrew McCallum,et al.  Database of NIH grants using machine-learned categories and graphical clustering , 2011, Nature Methods.

[2]  Monica M. C. Schraefel,et al.  Connecting the Dots: A Multi-pivot Approach to Data Exploration , 2011, SEMWEB.

[3]  Lei Shi,et al.  VEGAS: Visual influEnce GrAph Summarization on Citation Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[4]  Katy Börner,et al.  Atlas of Science - Visualizing What We Know , 2010 .

[5]  Silvana Castano,et al.  Dimensional Clustering of Linked Data: Techniques and Applications , 2015, Trans. Large Scale Data Knowl. Centered Syst..

[6]  Kurt Hornik,et al.  topicmodels : An R Package for Fitting Topic Models , 2016 .

[7]  K. Brner Atlas of Science: Visualizing What We Know , 2010 .

[8]  Kevin W. Boyack,et al.  Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches , 2011, PloS one.

[9]  Wolfgang Glänzel,et al.  Combining full-text analysis and bibliometric indicators. A pilot study , 2005, Scientometrics.

[10]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[11]  Wolf-Tilo Balke,et al.  Demonstrating the semantic growbag: automatically creating topic facets for faceteddblp , 2007, JCDL '07.

[12]  Kevin W. Boyack,et al.  Toward a consensus map of science , 2009 .

[13]  Xiangliang Zhang,et al.  Delve: A Dataset-Driven Scholarly Search and Analysis System , 2017, SKDD.

[14]  Wang-Chien Lee,et al.  CiteSeerx: an architecture and web service design for an academic document search engine , 2006, WWW '06.

[15]  Leah G. Nichols A topic model approach to measuring interdisciplinarity at the National Science Foundation , 2014, Scientometrics.

[16]  Peter Bergström,et al.  Augmenting the exploration of digital libraries with web-based visualizations , 2009, 2009 Fourth International Conference on Digital Information Management.

[17]  Ben Shneiderman,et al.  Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization , 2012, J. Assoc. Inf. Sci. Technol..

[18]  David B. Dunson,et al.  Probabilistic topic models , 2012, Commun. ACM.

[19]  Arho Suominen,et al.  Modeling : Comparison of Unsupervised Learning and Human-Assigned Subject Classification , 2015 .

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Silvana Castano,et al.  Thematic Clustering and Exploration of Linked Data , 2012, SeCO Book.

[22]  S. Butler,et al.  Typologies of Prescription Opioid Use in a Large Sample of Adults Assessed for Substance Abuse Treatment , 2011, PloS one.

[23]  Fabien L. Gandon,et al.  Discovery hub: on-the-fly linked data exploratory search , 2013, I-SEMANTICS '13.

[24]  Enrico Motta,et al.  Exploring Scholarly Data with Rexplore , 2013, International Semantic Web Conference.

[25]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[26]  Mahesh Motwani,et al.  Survey of clustering algorithms for MANET , 2009, ArXiv.

[27]  Cassidy R. Sugimoto,et al.  Topics in dynamic research communities: An exploratory study for the field of information retrieval , 2012, J. Informetrics.

[28]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.