1. Off topic results: e.g. the query [Gold Rush] produces results related to tourism and real estate. 2. Non-academic sites: that are not appropriate for academic citation, e.g., user generated sites like Yahoo Answers, highly biased sites like [1]. 3. Wrong level of detail: e.g., pages targeted at college students. We presented this problem and an initial solution for eliminating off topic results in [5]. In this paper, we improve on that solution and present novel approaches for solving the other two problems. The core idea of this paper is that we can infer which documents are relevant for a given query in the context of the course through semantic comparison of matching documents on the Web against expert reference material from the course textbook. We use Google’s Custom Search Engine (CSE) infrastructure as a tool to explore this insight (http://google.com/cse). CSE provides an API to bias Google’s search results to boost/demote pages based on the sites they are from and query rewrites they match. Our objective in this paper is to automatically create a CSE for a course from the course’s textbook. We achieve this by automatically identifying a ranked list of sites that are about
[1]
Ryen W. White,et al.
Personalizing web search results by reading level
,
2011,
CIKM '11.
[2]
W. Bruce Croft,et al.
Automatic recognition of reading levels from user queries
,
2004,
SIGIR '04.
[3]
Daniela Petrelli,et al.
Hybrid Search: Effectively Combining Keywords and Semantic Searches
,
2008,
ESWC.
[4]
Matt Wytock,et al.
Course-specific search engines: semi-automated methods for identifying high quality topic-specific corpora
,
2013,
WWW '13 Companion.
[5]
James A. Hendler,et al.
Web 3.0: The Dawn of Semantic Search
,
2010,
Computer.
[6]
Ramanathan V. Guha,et al.
Semantic search
,
2003,
WWW '03.
[7]
Laura Farinetti,et al.
Ontology Driven Semantic Search
,
2004
.