Discovery of Weather Forecast Web Resources Based on Ontology and Content-Driven Hierarchical Classification

Monitoring of environmental information is critical both for the evolvement of important environmental events, as well as for everyday life activities. In this work, we focus on the discovery of web resources that provide weather forecasts. To this end we submit domain-specific queries to a general purpose search engine and post process the results by introducing a hierarchical two layer classification scheme. The top layer includes two classification models: a) the first is trained using ontology concepts as textual features; b) the second is trained using textual features that are learned from a training corpus. The bottom layer includes a hybrid classifier that combines the results of the top layer. We evaluate the proposed technique by discovering weather forecast websites for cities of Finland and compare the results with previous works.

[1]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[2]  Yiannis Kompatsiaris,et al.  AQUAM: automatic query formulation architecture for mobile applications , 2008, MUM '08.

[3]  Wei-Ying Ma,et al.  Web-page classification through summarization , 2004, SIGIR '04.

[4]  Toru Ishida,et al.  Domain-specific Web search with keyword spices , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Qiang Wang,et al.  Ontology-Based Focused Crawling , 2009, 2009 International Conference on Information, Process, and Knowledge Management.

[7]  Yiannis Kompatsiaris,et al.  Personalized Environmental Service Orchestration for Quality of Life Improvement , 2012, AIAI.

[8]  Koraljka Golub,et al.  Importance of HTML Structural Elements and Metadata in Automated Subject Classification , 2005, ECDL.

[9]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[10]  Yiannis Kompatsiaris,et al.  Discovery of Environmental Nodes in the Web , 2012, IRFC.

[11]  Jong-Hyeok Lee,et al.  Text categorization based on k-nearest neighbor approach for Web site classification , 2003, Inf. Process. Manag..

[12]  Jaakko Kukkonen,et al.  A New Environmental Image Processing Method for Chemical Weather Forecasts in Europe , 2011, ITEE.

[13]  Hsinchun Chen,et al.  MetaSpider: Meta-searching and categorization on the Web , 2001, J. Assoc. Inf. Sci. Technol..

[14]  Dunja Mladenic,et al.  Turning Yahoo to Automatic Web-Page Classifier , 1998, European Conference on Artificial Intelligence.

[15]  Luciano Serafini,et al.  An Ontological Framework for Decision Support , 2012, JIST.

[16]  Berthier A. Ribeiro-Neto,et al.  Combining link-based and content-based methods for web document classification , 2003, CIKM '03.