Factor analysis of Internet traffic destinations from similar source networks

Purpose – This study aims to assess whether similar user populations in the Internet produce similar geographical traffic destination patterns on a per‐country basis.Design/methodology/approach – The authors collected a country‐wide NetFlow trace, which encompasses the whole Spanish academic network. Such a trace comprises several similar campus networks in terms of population size and structure. To compare their behaviors, the authors propose a mixture model, which is primarily based on the Zipf‐Mandelbrot power law to capture the heavy‐tailed nature of the per‐country traffic distribution. Then, factor analysis is performed to understand the relation between the response variable, number of bytes or packets per day, with dependent variables such as the source IP network, traffic direction, and country.Findings – Surprisingly, the results show that the geographical distribution is strongly dependent on the source IP network. Furthermore, even though there are thousands of users in a typical campus networ...

[1]  O. J. Wasem,et al.  Forecasting broadband demand between geographic areas , 1995 .

[2]  Hui Zang,et al.  Is sampled data sufficient for anomaly detection? , 2006, IMC '06.

[3]  O. J. Dunn,et al.  Applied statistics: analysis of variance and regression , 1975 .

[4]  Dan Pei,et al.  WWW 2009 MADRID! Track: Performance, Scalability and Availability / Session: Performance Network-Aware Forward Caching , 2022 .

[5]  Rami Puzis,et al.  Optimization of NIDS Placement for Protection of Intercommunicating Critical Infrastructures , 2008, EuroISI.

[6]  Rossano Schifanella,et al.  FairPeers: Efficient Profit Sharing in Fair Peer-to-Peer Market Places , 2007, Journal of Network and Systems Management.

[7]  Steve Uhlig,et al.  IP geolocation databases: unreliable? , 2011, CCRV.

[8]  David G. Schwartz The Internet in six words or less , 2010, Internet Res..

[9]  Hui Liu,et al.  A Peer-To-Peer Traffic Identification Method Using Machine Learning , 2007, 2007 International Conference on Networking, Architecture, and Storage (NAS 2007).

[10]  Randy H. Katz,et al.  Geographic Properties of Internet Routing , 2002, USENIX Annual Technical Conference, General Track.

[11]  Terry L King Smooth Tests of Goodness of Fit , 1991 .

[12]  Steve Uhlig,et al.  Assessing the Geographic Resolution of Exhaustive Tabulation for Geolocating Internet Hosts , 2008, PAM.

[13]  Emanuele Giovannetti,et al.  Agglomeration in Internet Co-operation Peering Agreements , 2005 .

[14]  Rami Puzis,et al.  Collaborative attack on Internet users' anonymity , 2009, Internet Res..

[15]  Michael Zink,et al.  Characteristics of YouTube network traffic at a campus network - Measurements, models, and implications , 2009, Comput. Networks.

[16]  G. Spagnolo,et al.  Internet Peering as a Network of Relations , 2007 .

[17]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[18]  Wu-chi Feng,et al.  A traffic characterization of popular on-line games , 2005, IEEE/ACM Transactions on Networking.

[19]  Aiko Pras,et al.  Using NetFlow/IPFIX for Network Management , 2009, Journal of Network and Systems Management.

[20]  Lakshminarayanan Subramanian,et al.  An investigation of geographic mapping techniques for internet hosts , 2001, SIGCOMM 2001.

[21]  Wolfgang John,et al.  Differences between In- and Outbound Internet Backbone Traffic , 2007 .

[22]  Javier Aracil,et al.  On the duration and spatial characteristics of internet traffic measurement experiments , 2008, IEEE Communications Magazine.

[23]  Larry Press,et al.  Commercialization of the Internet , 1994, CACM.

[24]  kc claffy,et al.  Understanding Internet traffic streams: dragonflies and tortoises , 2002, IEEE Commun. Mag..

[25]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[26]  Evangelos P. Markatos,et al.  One-click hosting services: a file-sharing hideout , 2009, IMC '09.

[27]  W. Norton A Business Case for ISP Peering , 2002 .

[28]  Konstantina Papagiannaki,et al.  A pragmatic definition of elephants in internet backbone traffic , 2002, IMW '02.

[29]  Balachander Krishnamurthy,et al.  Internet Measurement - Infrastructure, Traffic, and Applications , 2006 .

[30]  Sergio L. Toral Marín,et al.  Website Structure Mining using Social Network Analysis , 2011, Internet Res..

[31]  E. Ingenieros Web site structure mining using social network analysis , 2011 .

[32]  Tiago Fioreze,et al.  SURFmap: A network monitoring tool based on the Google Maps API , 2009, 2009 IFIP/IEEE International Symposium on Integrated Network Management.

[33]  Tim Bass,et al.  Intrusion detection systems and multisensor data fusion , 2000, CACM.

[34]  Ferenc Izsák,et al.  Maximum likelihood estimation for constrained parameters of multinomial distributions - Application to Zipf-Mandelbrot models , 2006, Comput. Stat. Data Anal..

[35]  Zongpeng Li,et al.  The Flattening Internet Topology: Natural Evolution, Unsightly Barnacles or Contrived Collapse? , 2008, PAM.

[36]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[37]  Aiko Pras,et al.  Gaussian traffic everywhere? , 2006, 2006 IEEE International Conference on Communications.

[38]  Serge Fdida,et al.  Investigating the Imprecision of IP Block-Based Geolocation , 2007, PAM.

[39]  Anja Feldmann,et al.  Deriving traffic demands for operational IP networks: methodology and experience , 2001, TNET.

[40]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[41]  Marco Mellia,et al.  Inferring undesirable behavior from P2P traffic analysis , 2009, SIGMETRICS '09.

[42]  T. Utlaut Nonparametric Statistics with Applications to Science and Engineering , 2008 .

[43]  Zhi-Li Zhang,et al.  Identifying and tracking suspicious activities through IP gray space analysis , 2007, MineNet '07.

[44]  Brani Vidakovic,et al.  Nonparametric Statistics with Applications to Science and Engineering (Wiley Series in Probability and Statistics) , 2007 .

[45]  A. Huitson,et al.  Applied Statistics: Analysis of Variance and Regression , 1976 .

[46]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[47]  Lada A. Adamic,et al.  Zipf's law and the Internet , 2002, Glottometrics.

[48]  Sally Floyd,et al.  Difficulties in simulating the internet , 2001, TNET.

[49]  Gene V. Glass,et al.  Consequences of Failure to Meet Assumptions underlying the Analysis of Variance and Covariance. , 1972 .

[50]  Ralph B. D'Agostino,et al.  Goodness-of-Fit-Techniques , 2020 .

[51]  Bruce M. Maggs,et al.  Cutting the electric bill for internet-scale systems , 2009, SIGCOMM '09.

[52]  Serge Fdida,et al.  Constraint-Based Geolocation of Internet Hosts , 2004, IEEE/ACM Transactions on Networking.