Statistical relational learning for single network domains
暂无分享,去创建一个
Many domains exhibit natural relational structures—from the world wide web to scientific publications and social and biological systems. We call such domains single network domains, because the whole domain can be represented as a single network, where every data instance is potentially linked to some other instances. In such domains, the relational links between data instances often indicate probabilistic dependencies between the attribute values on the instances, which can be explored by machine learning algorithms to improve modeling and prediction accuracy.
Due to the existence of dependences, it is no longer suitable to assume data instances from a single network domain to be independent and identically distributed (IID). Moreover, statistical relational learning (SRL) methods for single networks often model the joint distribution over data instances, and therefore the models do not fit into the conventional paradigm of machine learning either. We argue that single network domains deserve more attention because classic theory is no longer applicable, and comparing with IID domains, there are a number of new factors that affect the performance of relational learning for single network domains. By disentangling these factors and formally analyzing SRL methods, there is a great opportunity to advance the state of the art. This thesis research has achieved the following goals as a preliminary attempt in the area: (1) establishing theoretical results regarding the learnability of relational models by formulating suitable assumptions on the weakness of infinite range dependence, (2) developing new relational learning and inference algorithms that improve classification performance, and are also scalable to large networks, and (3) developing novel relational model representations based on assumptions that are more suitable for real world networks.