Network quantification despite biased labels

The increasing availability of participatory web and social media presents enormous opportunities to study human relations and collective behaviors. Many applications involving decision making want to obtain certain generalized properties about the population in a network, such as the proportion of actors given a category, instead of the category of individuals. While data mining and machine learning researchers have developed many methods for link-based classification or relational learning, most are optimized to classify individual nodes in a network. In order to accurately estimate the prevalence of one class in a network, some quantification method has to be used. In this work, two kinds of approaches are presented: quantification based on classification or quantification based on link analysis. Extensive experiments are conducted on several representative network data, with interesting findings reported concerning efficacy and robustness of different quantification methods, providing insights to further quantify the ebb and flow of online collective behaviors at macro-level.

[1]  Gary M. Weiss,et al.  Quantification and semi-supervised classification methods for handling changes in class distribution , 2009, KDD.

[2]  E. Deaux,et al.  Key Informant Versus Self-Report Estimates of Health-Risk Behavior , 1985 .

[3]  Mohsen Malekinejad,et al.  Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance in International Settings: A Systematic Review , 2008, AIDS and Behavior.

[4]  S. Berg Snowball Sampling—I , 2006 .

[5]  Annette Bernhardt,et al.  Documenting Unregulated Work: A Survey of Workplace Violations in Chicago, Los Angeles and New York City , 2011 .

[6]  P. Levy,et al.  A three-population model for sequential screening for bacteriuria. , 1970, American journal of epidemiology.

[7]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[8]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[9]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[10]  B. Jones,et al.  The Politics of Attention: How Government Prioritizes Problems , 2006 .

[11]  Douglas D. Heckathorn,et al.  Respondent-driven sampling : A new approach to the study of hidden populations , 1997 .

[12]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[13]  George Forman,et al.  Counting Positives Accurately Despite Inaccurate Classification , 2005, ECML.

[14]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[15]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[16]  George Forman,et al.  Quantifying counts and costs via classification , 2008, Data Mining and Knowledge Discovery.

[17]  Diana C. Mutz Impersonal Influence: How Perceptions of Mass Collectives Affect Political Attitudes , 1998 .

[18]  M. Spreen,et al.  PERSONAL NETWORK SAMPLING, OUTDEGREE ANALYSIS AND MULTILEVEL ANALYSIS: INTRODUCING THE NETWORK CONCEPT IN STUDIES OF HIDDEN POPULATIONS , 1994 .

[19]  P. Biernacki,et al.  TARGETED SAMPLING: OPTIONS FOR THE STUDY OF HIDDEN POPULATIONS , 1989 .

[20]  Andrei Z. Broder,et al.  Estimating rates of rare events at multiple resolutions , 2007, KDD '07.

[21]  Jennifer Neville,et al.  Why collective inference improves relational classification , 2004, KDD.