Outlier detection in graph streams

A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the “typical” behavior of the underlying network. In this paper, we will provide first results on the problem of structural outlier detection in massive network streams. Such problems are inherently challenging, because the problem of outlier detection is specially challenging because of the high volume of the underlying network stream. The stream scenario also increases the computational challenges for the approach. We use a structural connectivity model in order to define outliers in graph streams. In order to handle the sparsity problem of massive networks, we dynamically partition the network in order to construct statistically robust models of the connectivity behavior. We design a reservoir sampling method in order to maintain structural summaries of the underlying network. These structural summaries are designed in order to create robust, dynamic and efficient models for outlier detection in graph streams. We present experimental results illustrating the effectiveness and efficiency of our approach.

[1]  Charu C. Aggarwal,et al.  Social Network Data Analytics , 2011 .

[2]  Dave Elliman,et al.  A review of segmentation and contextual analysis techniques for text recognition , 1990, Pattern Recognit..

[3]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[4]  Jiawei Han,et al.  A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks , 2009, Proc. VLDB Endow..

[5]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[6]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[7]  Charu C. Aggarwal,et al.  On biased reservoir sampling in the presence of stream evolution , 2006, VLDB.

[8]  Philip S. Yu,et al.  On Clustering Graph Streams , 2010, SDM.

[9]  Gale Martin,et al.  Recognizing Overlapping Hand-Printed Characters by Centered-Object Integrated Segmentation and Recognition , 1991, NIPS.

[10]  Berrin A. Yanikoglu,et al.  Off-line cursive handwriting recognition using neural networks , 1993, Defense, Security, and Sensing.

[11]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[12]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[13]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[14]  Giovanni Seni,et al.  External word segmentation of off-line handwritten text lines , 1994, Pattern Recognit..

[15]  Brian Kernighan,et al.  An efficient heuristic for partitioning graphs , 1970 .

[16]  Anthony K. H. Tung,et al.  Mining Outliers in Spatial Networks , 2006, DASFAA.

[17]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[18]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[19]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[20]  David R. Karger,et al.  Random sampling in cut, flow, and network design problems , 1994, STOC '94.

[21]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[22]  Majid Ahmadi,et al.  Segmentation of touching characters in printed document recognition , 1994, Pattern Recognit..

[23]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[24]  Chien-Huei Chen,et al.  Word recognition in a segmentation-free approach to OCR , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[25]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[26]  James A. Pittman,et al.  Integrated Segmentation and Recognition Through Exhaustive Scans or Learned Saccadic Jumps , 1993, Int. J. Pattern Recognit. Artif. Intell..

[27]  Yann LeCun,et al.  Off Line Recognition of Handwritten Postal Words Using Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[28]  Patrick Shen-Pei Wang,et al.  An Integrated Architecture for Recognition of Totally Unconstrained Handwritten Numerals , 1993, Int. J. Pattern Recognit. Artif. Intell..

[29]  N. D. Gorsky,et al.  Experiments with handwriting recognition using holographic representation of line images , 1994, Pattern Recognit. Lett..

[30]  Simon M. Lucas High performance OCR with syntactic neural networks , 1995 .

[31]  A. Kertesz,et al.  Dynamically connected neural network for character recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[32]  Walter A. Burkhard Partial match retrieval , 1976 .

[33]  C. G. Leedham,et al.  Automatic sorting of Australian handwritten letter mail using OCR and address feature verification , 1992, TENCON'92 - Technology Enabling Tomorrow.

[34]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[35]  H. C. LONGUET-HIGGINS,et al.  Non-Holographic Associative Memory , 1969, Nature.

[36]  Greg N. Frederickson,et al.  Data Structures for On-Line Updating of Minimum Spanning Trees, with Applications , 1985, SIAM J. Comput..

[37]  David R. Karger,et al.  Random Sampling in Cut, Flow, and Network Design Problems , 1999, Math. Oper. Res..

[38]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[39]  Ronald L. Rivest,et al.  Partial-Match Retrieval Algorithms , 1976, SIAM J. Comput..

[40]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[41]  Jim Austin,et al.  A Transputer Implementation of the ADAM Neural Network , 1995 .

[42]  A. C. Downton,et al.  A design philosophy for scalable parallel embedded vision systems , 1992 .

[43]  Philip S. Yu,et al.  GConnect: A Connectivity Index for Massive Disk-resident Graphs , 2009, Proc. VLDB Endow..