Classifying and Ranking: The First Step Towards Mining Inside Vertical Search Engines

Vertical Search Engines (VSEs), which usually work on specific domains, are designed to answer complex queries of professional users. VSEs usually have large repositories of structured instances. Traditional instance ranking methods do not consider the categories that instances belong to. However, users of different interests usually care only the ranking list in their own communities. In this paper we design a ranking algorithm -ZRank, to rank the classified instances according to their importances in specific categories. To test our idea, we develop a scientific paper search engine-CPaper. By employing instance classifying and ranking algorithms, we discover some helpful facts to users of different interests.

[1]  Hongjun Lu,et al.  SG-WRAP: a schema-guided wrapper generator , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Lizhu Zhou,et al.  Segmented Document Classification: Problem and Solution , 2006, DEXA.

[3]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[4]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[5]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[6]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[7]  Rong Jin,et al.  Title language model for information retrieval , 2002, SIGIR '02.

[8]  Wei-Ying Ma,et al.  Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[9]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[10]  Zhiqiang Zhang,et al.  A Highly Adaptable Web Information Extractor Using Graph Data Model , 2004, APWeb.

[11]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[12]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[13]  Craig A. Knoblock,et al.  Learning domain-independent string transformation weights for high accuracy object identification , 2002, KDD.

[14]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[15]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.