Intelligent polar cyberinfrastructure: enabling semantic search in geospatial metadata catalogue to support polar data discovery

Polar regions have garnered substantial research attention in recent years because they are key drivers of the Earth’s climate, a source of rich mineral resources, and the home of a variety of marine life. Nevertheless, global warming over the past century is pushing the polar systems towards a tipping point: the systems are at high-risk from melting snow and sea ice covers, permafrost thawing, and acidification of the Arctic oceans. To increase understanding of the polar environment, the National Science Foundation established a Polar Cyberinfrastructure (CI) program, aimed at utilizing advanced software architecture to support polar data analysis and decision-making. At the center of this Polar CI research are data resources and data discovery components that facilitate the search and retrieval of polar data. This paper reports our development of a semantic search tool that supports the intelligent discovery of polar datasets. This tool is built on latent semantic analysis techniques, which improves search performance by identifying hidden semantic associations between terminologies used in the various datasets’ metadata. The software tool is implemented using an object-oriented design pattern and has been successfully integrated into a popular open source metadata catalog as a new semantic search support. A semantic matrix is maintained persistently within the catalogue to store the semantic associations. A dynamic update mechanism was also developed to allow automated update of semantics once more metadata are loaded into or removed from the catalog. We explored the effects of rank reduction to the effectiveness of this semantic search module and demonstrated its better performance than the traditional search techniques.

[1]  Andrei P. Sokolov,et al.  Investigating the Causes of the Response of the Thermohaline Circulation to Past and Future Climate Changes , 2006 .

[2]  Chaowei Phil Yang,et al.  A Semantic Enhanced Search for Spatial Web Portals , 2008, AAAI Spring Symposium: Semantic Scientific Knowledge Integration.

[3]  Enrico Motta,et al.  AquaLog: An Ontology-Portable Question Answering System for the Semantic Web , 2005, ESWC.

[4]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[5]  Amit P. Sheth,et al.  Geospatial Ontology Development and Semantic Analytics , 2006, Trans. GIS.

[6]  Werner Kuhn,et al.  Semantic interoperability: A central issue for sharing geographic information , 1999 .

[7]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[8]  Konrad A Hughen,et al.  Arctic Environmental Change of the Last Four Centuries , 1997 .

[9]  Christoph Mangold,et al.  A survey and classification of semantic search approaches , 2007, Int. J. Metadata Semant. Ontologies.

[10]  Philipp Cimiano,et al.  Porting natural language interfaces between domains: an experimental user study with the ORAKEL system , 2007, IUI '07.

[11]  Anne E. James,et al.  Towards the Development of an Integrated Framework for Enhancing Enterprise Search Using Latent Semantic Indexing , 2011, ICCS.

[12]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[13]  Jeffery R. Scott,et al.  The ocean's role in polar climate change: asymmetric Arctic and Antarctic responses to greenhouse gas and ozone forcing , 2014, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[14]  Francis G. McCabe,et al.  Reference Model for Service Oriented Architecture 1.0 , 2006 .

[15]  Wenwen Li,et al.  The GEOSS clearinghouse high performance search engine , 2011, 2011 19th International Conference on Geoinformatics.

[16]  S. Dumais Latent Semantic Analysis. , 2005 .

[17]  Krzysztof Janowicz,et al.  Observation‐Driven Geo‐Ontology Engineering , 2012, Trans. GIS.

[18]  Birger Hjørland The foundation of the concept of relevance , 2009 .

[19]  W. Li,et al.  Semantic-based web service discovery and chaining for building an Arctic spatial data infrastructure , 2011, Comput. Geosci..

[20]  Víctor Jesús Sosa Sosa,et al.  Usage of Domain Ontologies for Web Search , 2008, DCAI.

[21]  Zhenlong Li,et al.  An optimized framework for seamlessly integrating OGC Web Services to support geospatial sciences , 2011, Int. J. Geogr. Inf. Sci..

[22]  Eero Hyvönen,et al.  Application of Ontology Techniques to View-Based Semantic Search and Browsing , 2004, ESWS.

[23]  Patricia A. L. Cochran Impacts on Indigenous Peoples from Ecosystem Changes in the Arctic Ocean , 2013 .

[24]  F. Chapin,et al.  Permafrost and the Global Carbon Budget , 2006, Science.

[25]  Gokhan Tur,et al.  LDA Based Similarity Modeling for Question Answering , 2010, HLT-NAACL 2010.

[26]  Cuthbert Daniel,et al.  Fitting Equations to Data: Computer Analysis of Multifactor Data , 1980 .

[27]  Enrico Motta,et al.  Semantically enhanced Information Retrieval: An ontology-based approach , 2011, J. Web Semant..

[28]  Marika M. Holland,et al.  Future abrupt reductions in the summer Arctic sea ice , 2006 .

[29]  Alia I. Abdelmoty,et al.  The SPIRIT Spatial Search Engine: Architecture, Ontologies and Spatial Indexing , 2004, GIScience.

[30]  K. Bellmann Daniel, C., F. S. WOOD, J. W. GORMAN: Fitting Equations to Data. Computer Analysis of Multifactor Data for Scientists and Engineers. John Wiley & Sons, New York-London-Sydney-Toronto 1974. XIV, 342 S., 132 Abb., 33 Tab., £6.50 , 1975 .

[31]  Birger Hjørland The foundation of the concept of relevance , 2010 .

[32]  Sebastian Rudolph,et al.  Ontology-Based Interpretation of Keywords for Semantic Search , 2007, ISWC/ASWC.

[33]  Huilin Wang Distributed Catalogue Search of Earth Observation Data , 2013 .

[34]  Pablo Castells,et al.  An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval , 2007, IEEE Transactions on Knowledge and Data Engineering.

[35]  Jinhui Xiong,et al.  An Ontology-Based Semantic Search Approach for Geosciences , 2009, 2009 Second International Symposium on Knowledge Acquisition and Modeling.

[36]  R. Nicholls,et al.  Sea-level rise and its possible impacts given a ‘beyond 4°C world’ in the twenty-first century , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[37]  Max F. Meyer,et al.  The Proof and Measurement of Association between Two Things. , 1904 .

[38]  Escuela Politécnica Superior,et al.  Semantically enhanced Information Retrieval: an ontology-based approach , 2009 .

[39]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[40]  Michael F. Goodchild,et al.  A Geospatial Cyberinfrastructure for Urban Economic Analysis and Spatial Decision-Making , 2013, ISPRS Int. J. Geo Inf..

[41]  Michael Piasecki,et al.  Hydroseek: an ontology-aided data discovery system for hydrologic sciences , 2007 .

[42]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[43]  Nigel W. Arnell,et al.  A comparative analysis of projected impacts of climate change on river runoff from global and catchment-scale hydrological models , 2010 .

[44]  Gregoris Mentzas,et al.  Using latent topics to enhance search and recommendation in Enterprise Social Software , 2012, Expert Syst. Appl..

[45]  Hongyu Chen,et al.  Effective use of latent semantic indexing and computational linguistics in biological and biomedical applications , 2012, Front. Physio..

[46]  Bin Zhou,et al.  Internet-Based Spatial Information Retrieval , 2008, Encyclopedia of GIS.

[47]  Anne Mouchet,et al.  Impact of Greenland and Antarctic ice sheet interactions on climate sensitivity , 2011 .

[48]  Michael F. Goodchild,et al.  Towards geospatial semantic search: exploiting latent semantic relations in geospatial data , 2014, Int. J. Digit. Earth.