POLYPHONET: An advanced social network extraction system from the Web

Social networks play important roles in the Semantic Web: knowledge management, information retrieval, ubiquitous computing, and so on. We propose a social network extraction system called POLYPHONET, which employs several advanced techniques to extract relations of persons, to detect groups of persons, and to obtain keywords for a person. Search engines, especially Google, are used to measure co-occurrence of information and obtain Web documents. Several studies have used search engines to extract social networks from the Web, but our research advances the following points: first, we reduce the related methods into simple pseudocodes using Google so that we can build up integrated systems. Second, we develop several new algorithms for social network mining such as those to classify relations into categories, to make extraction scalable, and to obtain and utilize person-to-word relations. Third, every module is implemented in POLYPHONET, which has been used at four academic conferences, each with more than 500 participants. We overview that system. Finally, a novel architecture called Iterative Social Network Mining is proposed. It utilizes simple modules using Google and is characterized by scalability and relate-identify processes: identification of each entity and extraction of relations are repeated to obtain a more precise social network.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Amit P. Sheth,et al.  A Framework for Schema-Driven Relationship Discovery from Unstructured Text , 2006, SEMWEB.

[3]  Yutaka Matsuo,et al.  Real-world oriented information sharing using social networks , 2005, GROUP '05.

[4]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[5]  Amit P. Sheth,et al.  Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection , 2006, WWW '06.

[6]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[7]  Mitsuru Ishizuka,et al.  Finding User Semantics on the Web using Word Co-occurrence Information , 2005 .

[8]  Peter Knees,et al.  Artist Classification with Web-Based Data , 2004, ISMIR.

[9]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[10]  Andrew Tomkins,et al.  The Web and Social Networks , 2002, Computer.

[11]  Rael Dornfest,et al.  Google hacks - 100 industrial-strength tips and tools , 2002 .

[12]  Bernardo A. Huberman,et al.  Email as spectroscopy: automated discovery of community structure within organizations , 2003 .

[13]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[14]  Bradley Malin Unsupervised Name Disambiguation via Social Network Similarity , 2005 .

[15]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[16]  Steffen Staab,et al.  Social Networks Applied , 2005, IEEE Intell. Syst..

[17]  Kôiti Hasida,et al.  Finding Social Network for Trust Calculation , 2004, ECAI.

[18]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[19]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[20]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[21]  Dan Roth,et al.  Semantic Integration in Text: From Ambiguous Names to Identifiable Entities , 2005, AI Mag..

[22]  Mitsuru Ishizuka,et al.  Extracting Relations in Social Networks from the Web Using Similarity Between Collective Contexts , 2006, SEMWEB.

[23]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[24]  Lada A. Adamic,et al.  A social network caught in the Web , 2003, First Monday.

[25]  Hsin-Hsi Chen,et al.  Novel Association Measures Using Web Search with Double Checking , 2006, ACL.

[26]  M. Harada,et al.  Finding authoritative people from the Web , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[27]  Mehran Sahami,et al.  A Web-based Kernel Function for Matching Short Text Snippets , 2005 .

[28]  Steffen Staab,et al.  Towards the self-annotating web , 2004, WWW '04.

[29]  James A. Hendler,et al.  Accuracy of Metrics for Inferring Trust and Reputation in Semantic Web-Based Social Networks , 2004, EKAW.

[30]  Kôiti Hasida,et al.  POLYPHONET: an advanced social network extraction system from the web , 2006, WWW '06.

[31]  Oren Etzioni,et al.  A search engine for natural language applications , 2005, WWW '05.

[32]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[33]  Bart Selman,et al.  The Hidden Web , 1997, AI Mag..

[34]  Peter Mika,et al.  Flink: Semantic Web technology for the extraction and analysis of social networks , 2005, J. Web Semant..

[35]  Li Ding,et al.  Social Networking on the Semantic Web , 2005 .

[36]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[37]  Toru Ishida,et al.  Semantic Web link analysis to discover social relationships in academic communities , 2005, The 2005 Symposium on Applications and the Internet.

[38]  Danushka Bollegala,et al.  Disambiguating Personal Names on the Web Using Automatically Extracted Key Phrases , 2006, ECAI.

[39]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[40]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[41]  Charles L. A. Clarke,et al.  Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[42]  Elizabeth D. Mynatt,et al.  Leveraging social networks for information sharing , 2004, CSCW.

[43]  Andrew McCallum,et al.  Extracting social networks and contact information from email and the Web , 2004, CEAS.

[44]  Steffen Staab,et al.  Gimme' the context: context-driven automatic semantic annotation with C-PANKOW , 2005, WWW '05.

[45]  Daniel Gruhl,et al.  Disambiguation of References to Individuals , 2005 .

[46]  Yoshiyuki Nakamura,et al.  System design of event space information support utilizing CoBITs , 2004, 24th International Conference on Distributed Computing Systems Workshops, 2004. Proceedings..

[47]  Mitsuru Ishizuka,et al.  Extracting Social Networks Among Various Entities on the Web , 2007, ESWC.

[48]  John Scott What is social network analysis , 2010 .

[49]  Takuichi Nishimura,et al.  Robust Estimation of Google Counts for Social Network Extraction , 2007, AAAI.

[50]  Nina Wacholder,et al.  Disambiguation of Proper Names in Text , 1997, ANLP.

[51]  Andrei Z. Broder,et al.  Sampling Search-Engine Results , 2005, WWW '05.