WINACS: construction and analysis of web-based computer science information networks

WINACS (Web-based Information Network Analysis for Computer Science) is a project that incorporates many recent, exciting developments in data sciences to construct a Web-based computer science information network and to discover, retrieve, rank, cluster, and analyze such an information network. With the rapid development of the Web, huge amounts of information are available in the form of Web documents, structures, and links. It has been a dream of the database and Web communities to harvest such information and reconcile the unstructured nature of the Web with the neat, semi-structured schemas of the database paradigm. Taking computer science as a dedicated domain, WINACS first discovers related Web entity structures, and then constructs a heterogeneous computer science information network in order to rank, cluster and analyze this network and support intelligent and analytical queries.

[1]  Donato Malerba,et al.  Unexpected results in automatic list extraction on the web , 2011, SKDD.

[2]  Philip S. Yu,et al.  Object Distinction: Distinguishing Objects with Identical Names , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Donato Malerba,et al.  Mapping web pages to database records via link paths , 2010, CIKM.

[4]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[5]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[6]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[7]  Donato Malerba,et al.  Extracting general lists from web documents: a hybrid approach , 2011, IEA/AIE'11.

[8]  Raghu Ramakrishnan,et al.  DBLife: A Community Information Management Platform for the Database Research Community (Demo) , 2007, CIDR.

[9]  Jiawei Han,et al.  Promotion Analysis in Multi-Dimensional Space , 2009, Proc. VLDB Endow..

[10]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Philip S. Yu,et al.  Mining knowledge from databases: an information network analysis approach , 2010, SIGMOD Conference.

[12]  Jiawei Han,et al.  Mining advisor-advisee relationships from research publication networks , 2010, KDD.

[13]  Donato Malerba,et al.  Growing parallel paths for entity-page discovery , 2011, WWW.

[14]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.