论文信息 - Suspicion scoring based on guilt-by-association, colle ctive inference, and focused data access 1

Suspicion scoring based on guilt-by-association, colle ctive inference, and focused data access 1

We describe a guilt-by-association system that can be use d to rank entities by their suspiciousness. We demonstra te the algorithm on a suite of data sets generated by a terror istworld simulator developed under a DoD program. The data sets consist of thousands of people and some known links between them. We show that the system ranks truly malicious individuals highly, even if only relatively few are known to be malicious ex ante. When used as a tool for identifying promising data-gathering opportunities, the system focuses on gathering more information about the most suspicious people and thereby increases the density of link age in appropriate parts of the network. We assess performance under conditions of noisy prior knowledge (score quality varies by data set under moderate noise), and whether augmenting the network with prior scores based on profiling information improves the scoring (it doesn’t). Al though the level of performance reported here would not support direct action on all data sets, it does recommend th e consideration of network-scoring techniques as a new source of evidence in decision making. For example, the system can operate on networks far larger and more complex than could be processed by a human analyst .

Foster Provost | Sofus A. Macskassy | F. Provost | Sofus Mcskassy

[1] Sofus A. Macskassy,et al. Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[2] Jennifer Neville,et al. Why collective inference improves relational classification , 2004, KDD.

[3] Corinna Cortes,et al. Communities of interest , 2001, Intell. Data Anal..

[4] M. McPherson,et al. Birds of a Feather: Homophily in Social Networks , 2001 .

[5] Tom Fawcett,et al. Robust Classification for Imprecise Environments , 2000, Machine Learning.

[6] Piotr Indyk,et al. Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[7] Tom Fawcett,et al. Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[8] J J Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[9] P. Blau. Inequality and Heterogeneity: A Primitive Theory of Social Structure , 1978 .

[10] Foster Provost,et al. A Simple Relational Classifier , 2003 .

[11] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..