Power law in XML schema metrics

Software metrics are vital for the management of software development, especially when a new technology is being adopted and its best practice has yet to be established. XML Schema is a relatively new technology that has been widely adopted in software development. Despite its widespread usage in almost all different kinds of programming platforms, its usage patterns are not yet fully investigated. From two large sets of real XML Schemas, this thesis studies the distribution of some of the schema metrics and the structure of some large schemas. Elements in a schema are connected by their usage links. The interconnected elements can be viewed as a network of elements or a graph. This thesis also studies the structural properties of the network of the schema elements, including the scale free property, the connection of the graph, and its small world effect.

[1]  Michael Mitzenmacher,et al.  Dynamic Models for File Sizes and Double Pareto Distributions , 2004, Internet Math..

[2]  Sanjay Misra,et al.  Complexity Metric for XML Schema Documents , 2007 .

[3]  Paul Erdös,et al.  On random graphs, I , 1959 .

[4]  Christopher R. Myers,et al.  Software systems as complex networks: structure, function, and evolvability of software collaboration graphs , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Allen B. Downey,et al.  Evidence for long-tailed distributions in the internet , 2001, IMW '01.

[6]  X. Gabaix Zipf's Law and the Growth of Cities , 1999 .

[7]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Michele Marchesi,et al.  Power-Laws in a Large Object-Oriented Software System , 2007, IEEE Transactions on Software Engineering.

[10]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[11]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[12]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[14]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[15]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Ewan D. Tempero,et al.  Understanding the shape of Java software , 2006, OOPSLA '06.

[17]  Byron Choi,et al.  What are real DTDs like? , 2002, WebDB.

[18]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[19]  Denilson Barbosa,et al.  The XML web: a first study , 2003, WWW '03.

[20]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[21]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[22]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[23]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[24]  Irena Holubová,et al.  Statistical Analysis of Real XML Data Collections , 2006, COMAD.

[25]  Andreas Heuer,et al.  Metrics for XML Document Collections , 2002, EDBT Workshops.

[26]  Allen B. Downey,et al.  The structural cause of file size distributions , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.