Network Report: A Structured Description for Network Datasets

The rapid development of network science and technologies depends on shareable datasets. Currently, there is no standard practice for reporting and sharing network datasets. Some network dataset providers only share links, while others provide some contexts or basic statistics. As a result, critical information may be unintentionally dropped, and network dataset consumers may misunderstand or overlook critical aspects. Inappropriately using a network dataset can lead to severe consequences (e.g., discrimination) especially when machine learning models on networks are deployed in high-stake domains. Challenges arise as networks are often used across different domains (e.g., network science, physics, etc) and have complex structures. To facilitate the communication between network dataset providers and consumers, we propose network report. A network report is a structured description that summarizes and contextualizes a network dataset. Network report extends the idea of dataset reports (e.g., Datasheets for Datasets) from prior work with network-specific descriptions of the non-i.i.d. nature, demographic information, network characteristics, etc. We hope network reports encourage transparency and accountability in network research and development across different fields.

[1]  Duen Horng Chau,et al.  Graph Vulnerability and Robustness: A Survey , 2021, IEEE Transactions on Knowledge and Data Engineering.

[2]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[3]  A. Hanna,et al.  Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices , 2021, FAccT.

[4]  Emily Denton,et al.  Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure , 2020, FAccT.

[5]  Felix Naumann,et al.  Data Preparation , 2020, SIGMOD Rec..

[6]  Jacob G. Scott,et al.  Exploring complex networks with the ICON R package , 2020, ArXiv.

[7]  Danai Koutra,et al.  CoDEx: A Comprehensive Knowledge Graph Completion Benchmark , 2020, EMNLP.

[8]  Kristian Kersting,et al.  TUDataset: A collection of benchmark datasets for learning with graphs , 2020, ArXiv.

[9]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[10]  Hanna M. Wallach,et al.  Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI , 2020, CHI.

[11]  Christos Faloutsos,et al.  Higher-Order Label Homogeneity and Spreading in Graphs , 2020, WWW.

[12]  R. Stuart Geiger,et al.  Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from? , 2019, FAT*.

[13]  Mazhar Hameed,et al.  Data Preparation: A Survey of Commercial Tools , 2020 .

[14]  L. Blommaert,et al.  The gender gap in job authority: Do social network resources matter? , 2020, Acta Sociologica.

[15]  Joseph Fisher Measuring Social Bias in Knowledge Graph Embeddings , 2019, ArXiv.

[16]  H. V. Jagadish,et al.  Learning to Answer Complex Questions over Knowledge Bases with Query Composition , 2019, CIKM.

[17]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[18]  William L. Hamilton,et al.  Compositional Fairness Constraints for Graph Embeddings , 2019, ICML.

[19]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[20]  Kush R. Varshney,et al.  Increasing Trust in AI Services through Supplier's Declarations of Conformity , 2018, IBM J. Res. Dev..

[21]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[22]  Lei Chen,et al.  Interference cancelation scheme with variable bandwidth allocation for universal filtered multicarrier systems in 5G networks , 2018, EURASIP J. Wirel. Commun. Netw..

[23]  Juan Carlos De Martin,et al.  Ethical and Socially-Aware Data Labels , 2018, SIMBig.

[24]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[25]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[26]  Ahmed Hosny,et al.  The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards , 2018, Data Protection and Privacy.

[27]  Abolfazl Asudeh,et al.  A Nutritional Label for Rankings , 2018, SIGMOD Conference.

[28]  Ana-Andreea Stoica,et al.  Algorithmic Glass Ceiling in Social Networks: The effects of social recommendations on network diversity , 2018, WWW.

[29]  Chongcheng Chen,et al.  Data quality analysis and cleaning strategy for wireless sensor networks , 2018, EURASIP J. Wirel. Commun. Netw..

[30]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[31]  J. Michael Herrmann,et al.  A Review of No Free Lunch Theorems, and Their Implications for Metaheuristic Optimisation , 2018 .

[32]  Bert Huang,et al.  Beyond Parity: Fairness Objectives for Collaborative Filtering , 2017, NIPS.

[33]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[34]  Christos Faloutsos,et al.  CoreScope: Graph Mining Using k-Core Analysis — Patterns, Anomalies and Algorithms , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[35]  Ming-Wei Chang,et al.  Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[36]  Vincent A. Traag,et al.  Detecting communities using asymptotical Surprise , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[38]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[39]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.

[40]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[41]  Saharon Rosset,et al.  Leakage in data mining: formulation, detection, and avoidance , 2011, TKDD.

[42]  Arjan Kuijper,et al.  Visual Analysis of Large Graphs: State‐of‐the‐Art and Future Research Challenges , 2011, Eurographics.

[43]  Emily M. Bender Linguistic I Ssues in L Anguage Technology Lilt on Achieving and Evaluating Language-independence in Nlp on Achieving and Evaluating Language-independence in Nlp , 2022 .

[44]  Charalampos E. Tsourakakis Counting triangles in real-world networks using projections , 2011, Knowledge and Information Systems.

[45]  Athina Markopoulou,et al.  On the bias of BFS (Breadth First Search) , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[46]  Olaf Sporns,et al.  Complex network measures of brain connectivity: Uses and interpretations , 2010, NeuroImage.

[47]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[48]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[49]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[50]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[51]  Ying Fan,et al.  The effect of weight on community structure of networks , 2006, physics/0609218.

[52]  J. Leskovec,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[53]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[54]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[56]  Larry Wasserman,et al.  All of Statistics , 2004 .

[57]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[58]  Robert D. Tortora,et al.  Sampling: Design and Analysis , 2000 .

[59]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[60]  M. Newman,et al.  Renormalization Group Analysis of the Small-World Network Model , 1999, cond-mat/9903357.

[61]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[62]  P. V. Marsden,et al.  NETWORK DATA AND MEASUREMENT , 1990 .

[63]  P. Killworth,et al.  INFORMANT ACCURACY IN SOCIAL NETWORK DATA II , 1977 .