Inferring missing links in partially observed social networks

Determining the pattern of links within a large social network is often problematic due to the labour-intensive nature of the data collection and analysis process. With constrained data collection capabilities it is often only possible to either make detailed observations of a limited number of individuals in the network, or to make fewer observations of a larger number of people. Previously we have shown how detailed observation of a small network can be used, which infer where in the network previously unconnected individuals are likely to fit, thereby attempting to predict network growth as new people are considered for inclusion. Here, by contrast, we show how social network topology can be inferred following a limited observation of a large network. Essentially the issue is one of inferring the presence of links that are missed during a constrained data collection campaign on the network. It is particularly difficult to infer network structures for those organizations that actively seek to remain covert and undetected. However, it is often very useful to know if two given individuals are likely to be connected even though limited surveillance effort yields no evidence of a link. Specifically, we show how a statistical inference technique can be used to successfully predict the existence of links that are missed during network sampling. The procedure is demonstrated using network data obtained from open source publications.