MapReduce for HITS Algorithm with Application to Chinese Word Networks

This paper presents a design and implement of HITS algorithm on Hadoop with application to Chinese words networks analysis. To access the importance of word in the text, we build a directed Chinese word network based on word adjacency information, and similar to the idea of HITS algorithm we define values of authority and hub for each word associated with in-degree and outdegree. In order to solve the problem of large-scale computation requirements, we design an algorithm based on Hadoop platform with map/reduce framework. In this implement, a division unit is a record composed of a word and its adjacency information, and the values of authority and hub of words are calculated in parallel by Map and Reduce. Experiment results shows that the implement is effective.

[1]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[2]  Yong Li,et al.  Small-world patterns in Chinese phrase networks , 2005 .

[3]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[4]  Toru Ishida,et al.  Analysis and improvement of HITS algorithm for detecting Web communities , 2002, Proceedings 2002 Symposium on Applications and the Internet (SAINT 2002).

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Jingli Zhou,et al.  Block-Ranking: Content Similarity Retrieval Based on Data Partition in Network Storage Environment , 2010, J. Digit. Content Technol. its Appl..

[7]  Partha Dasgupta,et al.  Topology of the conceptual network of language. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Shlomo Moran,et al.  Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs , 2005, Information Retrieval.

[9]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[10]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[11]  Hui Gao,et al.  A New Agglomerative Hierarchical Clustering Algorithm Implementation based on the Map Reduce Framework , 2010, J. Digit. Content Technol. its Appl..