A group-detection algorithm attempts to identify groups of entities in relational data that belong to specific groups or subsystems, based on records of interactions among small subsets of the entities. For example, such algorithms may be used to detect groups (or systems) of interacting proteins in bio-networks based on multiple experiments, where each experiment attempts to identify only a small subset of the studied system. Measurements are typically noisy because they contain extraneous entities that are not members of the groups being studied. Therefore, a statistical characterization of group-finding performance is needed. This paper discusses metrics for measuring the probabilistic performance of group-detection algorithms. The metrics may be used to compare algorithms and to assess their performance in Monte Carlo simulation studies. We show that several traditional performance metrics are deficient if the size of a group is very small compared to the size of the population of entities being considered. Moreover, a pair of classical metrics (such as sensitivity and specificity or recall and precision) must be used to track the two types of errors. To address these two issues, a new information-theoretic metric, termed proficiency, is introduced. Proficiency may be used to measure the performance of any detection algorithm, including classical hypothesis tests in statistics.
[1]
Yiming Yang,et al.
Stochastic link and group detection
,
2002,
AAAI/IAAI.
[2]
Bart Selman,et al.
The Hidden Web
,
1997,
AI Mag..
[3]
TothPaolo,et al.
Algorithm 548: Solution of the Assignment Problem [H]
,
1980
.
[4]
Andrew W. Moore,et al.
Tractable group detection on large link data sets
,
2003,
Third IEEE International Conference on Data Mining.
[5]
M. Newman.
1 Who is the best connected scientist ? A study of scientific coauthorship networks
,
2004
.
[6]
Thomas M. Cover,et al.
Elements of Information Theory
,
2005
.
[7]
Paolo Toth,et al.
Algorithm 548: Solution of the Assignment Problem [H]
,
1980,
TOMS.