From Geoportals to Geographic Knowledge Portals

We present the application of Latent Semantic Analysis (LSA) in combination with recommender systems, in order to enhance discovery in geoportals. As a basis for discovery, metadata of spatial data and services, as well as of non-spatial resources, such as documents and scientific papers, is created and registered in the catalogue of the geoportal (semi-)automatically. Links that are not inherent in the data itself are established based on the semantic similarity of its textual content using LSA. This leads to the transition from unstructured data to structured (metadata) information, serving as a basis for the generation of knowledge. The metadata information is integrated into a recommendation system that provides a ranked list showing (1) what other users viewed and (2) the related resources discovered by the LSA workflow as a result. Based on the assumptions that similar texts have something in common and that users are likely to be interested in what other users viewed, recommendations provide a broader, but also more precise, search result; on the one hand, the recommender engine considers additional information; on the other hand, it ranks resources based on the discovery experience of other users and the likeliness of the documents being related to each other.

[1]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[2]  Iraklis Varlamis,et al.  Omiotis: A Thesaurus-Based Measure of Text Relatedness , 2009, ECML/PKDD.

[3]  Mariana Belgiu,et al.  Recommender-based enhancement of discovery in Geoportals , 2012, Int. J. Spatial Data Infrastructures Res..

[4]  Jon Whittle,et al.  Free Text In User Reviews: Their Role In Recommender Systems , 2011 .

[5]  Cristiano Fugazza,et al.  Semantics-Aware Indexing of Geospatial Resources Based on Multilingual Thesauri: Methodology and Preliminary Results , 2012, Int. J. Spatial Data Infrastructures Res..

[6]  Peter M. Wiemer-Hastings,et al.  How Latent is Latent Semantic Analysis? , 1999, IJCAI.

[7]  Jan Wicijowski,et al.  Extracting Semantic Knowledge from Wikipedia , 2010 .

[8]  Iraklis Varlamis,et al.  Text Relatedness Based on a Word Thesaurus , 2010, J. Artif. Intell. Res..

[9]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Laura Díaz,et al.  Discovery of User-Generated Geographic Data Using Web Search Engines , 2009 .

[12]  S Gomathi Knowledge management -emerging perspectives , 2013 .

[13]  Preslav Nakov,et al.  Weight functions impact on LSA performance , 2001 .

[14]  Grégoire Dubois,et al.  EuroGEOSS: An interdisciplinary approach to research and applications for forestry, biodiversity and drought , 2011 .

[15]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[16]  Peter Wiemer-Hastings,et al.  Latent semantic analysis , 2004, Annu. Rev. Inf. Sci. Technol..

[17]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[18]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[19]  Rada Mihalcea,et al.  Text-to-Text Semantic Similarity for Automatic Short Answer Grading , 2009, EACL.

[20]  Young-Woo Seo,et al.  Investigating Semantic Knowledge for Text Learning , 2003 .

[21]  Barbara Hofer,et al.  The Development and Interlinkage of a Drought Vocabulary in the EuroGEOSS Interoperable Catalogue Infrastructure , 2012, Int. J. Spatial Data Infrastructures Res..

[22]  Michael J. Pazzani,et al.  A Framework for Collaborative, Content-Based and Demographic Filtering , 1999, Artificial Intelligence Review.

[23]  Bob Rehder,et al.  How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans , 1997 .

[24]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[25]  Johannes Scholz,et al.  Spatio-temporal Visualization of Simulation Results using a task-oriented Tile-based Design-Metaphor , 2012 .

[26]  Steffen Staab,et al.  Explicit Versus Latent Concept Models for Cross-Language Information Retrieval , 2009, IJCAI.

[27]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[28]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[29]  J. Selwood,et al.  SPATIAL PORTALS : ADDING VALUE TO SPATIAL DATA INFRASTRUCTURES , 2005 .

[30]  Daniel Lemire,et al.  Slope One Predictors for Online Rating-Based Collaborative Filtering , 2007, SDM.

[31]  Krzysztof Janowicz,et al.  Implementation and Evaluation of a Semantics-based User Interface for Web Gazetteers , 2009 .

[32]  S. Debowski Knowledge Management , 2005 .

[33]  Paul C. Smits,et al.  Resource Discovery in a European Spatial Data Infrastructure , 2007, IEEE Transactions on Knowledge and Data Engineering.

[34]  K. Shadan,et al.  Available online: , 2012 .

[35]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[36]  Philipp Cimiano,et al.  Cross-language Information Retrieval with Explicit Semantic Analysis , 2008, CLEF.

[37]  Stéphane Croisier The Rise of Semantic-aware Applications , 2012, Semantic Technologies in Content Management Systems.

[38]  Michael D. Lee,et al.  An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[39]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[40]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[41]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[42]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[43]  Christopher D. Manning,et al.  Random Walks for Text Semantic Similarity , 2009, Graph-based Methods for Natural Language Processing.

[44]  Richard Weber,et al.  Latent semantic analysis and keyword extraction for phishing classification , 2010, 2010 IEEE International Conference on Intelligence and Security Informatics.