Data Mining, Management and Visualization in Large Scientific Corpuses

Organizing scientific papers helps efficiently derive meaningful insights of the published scientific resources, enables researchers grasp rapid technological change and hence assists new scientific discovery. In this paper, we experiment text mining and data management of scientific publications for collecting and presenting useful information to support research. For efficient data management and fast information retrieval, four data storages are employed: a semantic repository, an index and search repository, a document repository and a graph repository, taking full advantage of their features and strength. The results show that the combination of these four repositories can effectively store and index the publication data with reliability and efficiency and hence supply meaningful information to support scientific research.

[1]  Miriam A. M. Capretz,et al.  Data management in cloud environments: NoSQL and NewSQL data stores , 2013, Journal of Cloud Computing: Advances, Systems and Applications.

[2]  Michael Gleicher,et al.  Serendip: Topic model-driven visual exploration of text corpora , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[3]  Lei Liu,et al.  An Ontology Definition Metamodel based Ripple-Effect Analysis Method for Ontology Evolution , 2006, 2006 10th International Conference on Computer Supported Cooperative Work in Design.

[4]  Steven J. Johnston,et al.  Clouds in Space: Scientific Computing using Windows Azure , 2013, Journal of Cloud Computing: Advances, Systems and Applications.

[5]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[6]  Henry Lieberman,et al.  Sesame: An Architecture for Storing and Querying RDF Data and Schema Information , 2005 .