Where Are the (Cellular) Data?

New generations of cellular networks are data oriented, targeting the integration of machine learning and artificial intelligence solutions. Data availability, required to train and compare machine learning based networking solutions, is therefore becoming an important topic and a significant concern. Operators do collect data, but they rarely share it because of privacy concerns. This article starts by reviewing the few publicly available cellular datasets, which created bursts of innovation with their release. The scarcity of such data is so acute that researchers are collecting network data using their own tools, developed in-house and covered in the second part of this survey.

[1]  Lanfranco Zanzi,et al.  Open Radio Access Networks (O-RAN) Experimentation Platform: Design and Datasets , 2023, IEEE Communications Magazine.

[2]  A. Sezgin,et al.  A comprehensive dataset of RIS-based channel measurements in the 5GHz band , 2023, IEEE Vehicular Technology Conference.

[3]  Z. Smoreda,et al.  The NetMob23 Dataset: A High-resolution Multi-region Service-level Mobile Data Traffic Cartography , 2023, ArXiv.

[4]  Hongwu Lv,et al.  Toward 5G NR High-Precision Indoor Positioning via Channel Frequency Response: A New Paradigm and Dataset Generation Method , 2022, IEEE Journal on Selected Areas in Communications.

[5]  Peng Liu,et al.  Efficient Joint DOA and TOA Estimation for Indoor Positioning With 5G Picocell Base Stations , 2022, IEEE Transactions on Instrumentation and Measurement.

[6]  Zhi-Li Zhang,et al.  A Comparative Measurement Study of Commercial 5G mmWave Deployments , 2022, IEEE INFOCOM 2022 - IEEE Conference on Computer Communications.

[7]  R. Cigno,et al.  AntiSense: Standard-compliant CSI obfuscation against unauthorized Wi-Fi sensing , 2021, Comput. Commun..

[8]  S. Basagni,et al.  Colosseum, the world's largest wireless network emulator , 2021, MobiCom.

[9]  Monisha Ghosh,et al.  A Comparison Study of Cellular Deployments in Chicago and Miami Using Apps on Smartphones , 2021, WiNTECH@MOBICOM.

[10]  Vaclav Raida,et al.  Data-Driven Estimation of Spatiotemporal Performance Maps in Cellular Networks , 2021 .

[11]  Andreas Burg,et al.  OpenCSI: An Open-Source Dataset for Indoor Localization Using CSI-Based Fingerprinting , 2021, ArXiv.

[12]  A. Evsukoff,et al.  Identifying Human Mobility Patterns in the Rio de Janeiro Metropolitan Area using Call Detail Records , 2020, Transportation Research Record: Journal of the Transportation Research Board.

[13]  M. Rupp,et al.  On the Stability of RSRP and Variability of Other KPIs in LTE Downlink - An Open Dataset , 2020, GLOBECOM 2020 - 2020 IEEE Global Communications Conference.

[14]  M. Rupp,et al.  Real World Performance of LTE Downlink in a Static Dense Urban Scenario - An Open Dataset , 2020, GLOBECOM 2020 - 2020 IEEE Global Communications Conference.

[15]  M. Rupp,et al.  Modified Dynamic Time Warping with a Reference Path for Alignment of Repeated Drive-Tests , 2020, 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall).

[16]  Rostand A. K. Fezeu,et al.  Lumos5G: Mapping and Predicting Commercial mmWave 5G Throughput , 2020, Internet Measurement Conference.

[17]  Emrecan Demirors,et al.  Arena: A 64-antenna SDR-based ceiling grid testing platform for sub-6 GHz 5G-and-Beyond radio spectrum research , 2020, Comput. Networks.

[18]  Benedek Rozemberczki,et al.  Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models , 2020, CIKM.

[19]  Dipankar Raychaudhuri,et al.  Challenge: COSMOS: A city-scale programmable testbed for experimentation with advanced wireless , 2020, MobiCom.

[20]  Agbotiname Lucky Imoize,et al.  Analysis of key performance indicators of a 4G LTE network based on experimental data obtained from a densely populated smart city , 2020, Data in brief.

[21]  Arvind Narayanan,et al.  A First Look at Commercial 5G Performance on Smartphones , 2019, WWW.

[22]  Michael Seufert,et al.  On the Analysis of YouTube QoE in Cellular Networks through in-Smartphone Measurements , 2019, 2019 12th IFIP Wireless and Mobile Networking Conference (WMNC).

[23]  Roger Andersson,et al.  Dataset on multichannel connectivity and video transmission carried on commercial 3G/4G networks in southern Sweden , 2019, Data in brief.

[24]  Philip Levis,et al.  Learning in situ: a randomized experiment in video streaming , 2019, NSDI.

[25]  Markus Rupp,et al.  Repeatability for Spatiotemporal Throughput Measurements in LTE , 2019, 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring).

[26]  Dominique Le Hello,et al.  Privacy in trajectory micro-data publishing: a survey , 2019, Trans. Data Priv..

[27]  Matti Latva-aho,et al.  Business Models for Local 5G Micro Operators , 2019, IEEE Transactions on Cognitive Communications and Networking.

[28]  I. Macaluso,et al.  Minimizing the Signaling Overhead and Latency Based on Users’ Mobility Patterns , 2019, IEEE Systems Journal.

[29]  Joaquín Torres-Sospedra,et al.  BLE RSS Measurements Dataset for Research on Accurate Indoor Positioning , 2019, Data.

[30]  Stephan ten Brink,et al.  Novel Massive MIMO Channel Sounding Data Applied to Deep Learning-based Indoor Positioning , 2018, 1810.04126.

[31]  Markus Rupp,et al.  Lightweight Detection of Tariff Limits in Cellular Mobile Networks , 2018, 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).

[32]  Pablo César,et al.  4G/LTE channel quality reference signal trace data set , 2018, MMSys.

[33]  Ahmed H. Zahran,et al.  Beyond throughput: a 4G LTE dataset with channel and context metrics , 2018, MMSys.

[34]  Markus Rupp,et al.  Deriving Cell Load from RSRQ Measurements , 2018, 2018 Network Traffic Measurement and Analysis Conference (TMA).

[35]  Luca P. Carloni,et al.  COSMOS , 2017, ACM Trans. Embed. Comput. Syst..

[36]  Lui Sha,et al.  A Mobile Geo-Communication Dataset for Physiology-Aware DASH in Rural Ambulance Transport , 2017, MMSys.

[37]  Shin-Lin Shieh,et al.  5G New Radio: Waveform, Frame Structure, Multiple Access, and Initial Access , 2017, IEEE Communications Magazine.

[38]  Tao Wang,et al.  Mobileinsight: extracting and analyzing cellular network information on smartphones , 2016, MobiCom.

[39]  Filip De Turck,et al.  HTTP/2-Based Adaptive Streaming of HEVC Video Over 4G/LTE Networks , 2016, IEEE Communications Letters.

[40]  Y. Truong,et al.  Understanding resident mobility in Milan through independent component analysis of Telecom Italia mobile usage data , 2016 .

[41]  Mahbub Hassan,et al.  Comprehensive mobile bandwidth traces from vehicular networks , 2016, MMSys.

[42]  Yong Li,et al.  Social-Community-Aware Resource Allocation for D2D Communications Underlaying Cellular Networks , 2016, IEEE Transactions on Vehicular Technology.

[43]  Hervé Rivano,et al.  LTE-A random access channel capacity evaluation for M2M communications , 2016, 2016 Wireless Days (WD).

[44]  Olivier Bonaventure,et al.  CRAWDAD dataset uclouvain/mptcp_smartphone (v.2016-03-04) , 2016 .

[45]  Alex Pentland,et al.  Energy consumption prediction using people dynamics derived from cellular network data , 2016, EPJ Data Science.

[46]  D. Leith,et al.  srsLTE: an open-source platform for LTE evolution and experimentation , 2016, WiNTECH.

[47]  Marwan A. Al-Namari,et al.  Internet traffic classification using machine learning approach: Datasets validation issues , 2016, 2016 Conference of Basic Sciences and Engineering Studies (SGCAC).

[48]  Marco Fiore,et al.  Large-Scale Mobile Traffic Analysis: A Survey , 2016, IEEE Communications Surveys & Tutorials.

[49]  Marco De Nadai,et al.  A multi-source dataset of urban life in the city of Milan and the Province of Trentino , 2015, Scientific Data.

[50]  Phuoc Tran-Gia,et al.  YoMoApp: A tool for analyzing QoE of YouTube HTTP adaptive streaming in mobile networks , 2015, 2015 European Conference on Networks and Communications (EuCNC).

[51]  Muhammad Ali Imran,et al.  Mobility prediction for handover management in cellular networks with control/data separation , 2015, 2015 IEEE International Conference on Communications (ICC).

[52]  Rex W. Douglass,et al.  High resolution population estimates from telecommunications data , 2015, EPJ Data Science.

[53]  V. Blondel,et al.  A survey of results on mobile phone datasets analysis , 2015, EPJ Data Science.

[54]  Hari Balakrishnan,et al.  WiFi, LTE, or Both?: Measuring Multi-Homed Wireless Internet Performance , 2014, Internet Measurement Conference.

[55]  Christian Bonnet,et al.  OpenAirInterface: A Flexible Platform for 5G Research , 2014, CCRV.

[56]  Lorenzo Bracciale,et al.  CRAWDAD dataset roma/taxi (v.2014-07-17) , 2014 .

[57]  Zbigniew Smoreda,et al.  D4D-Senegal: The Second Mobile Phone Data for Development Challenge , 2014, ArXiv.

[58]  Stefano Secci,et al.  Estimating human trajectories and hotspots through mobile phone data , 2014, Comput. Networks.

[59]  Erik G. Larsson,et al.  Massive MIMO for next generation wireless systems , 2013, IEEE Communications Magazine.

[60]  Hui Zang,et al.  Are call detail records biased for sampling human mobility? , 2012, MOCO.

[61]  J. Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[62]  Etienne Huens,et al.  Data for Development: the D4D Challenge on Mobile Phone Data , 2012, ArXiv.

[63]  Cecilia Mascolo,et al.  NextPlace: A Spatio-temporal Prediction Framework for Pervasive Systems , 2011, Pervasive.

[64]  Matthias Grossglauser,et al.  CRAWDAD dataset epfl/mobility (v.2009-02-24) , 2009 .

[65]  Biju Issac,et al.  Improved Bayesian Anti-Spam Filter Implementation and Analysis on Independent Spam Corpuses , 2009, 2009 International Conference on Computer Engineering and Technology.

[66]  J. Leskovec,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[67]  Nathan Eagle,et al.  CRAWDAD dataset mit/reality (v.2005-07-01) , 2005 .

[68]  Ian T. Foster,et al.  Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design , 2002, ArXiv.

[69]  Seungkeun Park,et al.  A Survey on 4G-5G Dual Connectivity: Road to 5G Implementation , 2021, IEEE Access.

[70]  Cormac J. Sreenan,et al.  Beyond Throughput: The Next Generation a 5G Dataset with Channel and Context Metrics , 2020 .

[71]  Beverley J. Oda Canadian Radio-television and Telecommunications Commission , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[72]  Yan Grunenberger,et al.  Edinburgh Research Explorer Performance Assessment of Open Software Platforms for 5G Prototyping , 2018 .

[73]  Marco Ajmone Marsan,et al.  MONROE : Measuring Mobile Broadband Networks in Europe , 2015 .

[74]  Imad Aad,et al.  The Mobile Data Challenge: Big Data for Mobile Computing Research , 2012 .

[75]  方华 google,我,萨娜 , 2006 .

[76]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[77]  John T. Nakahata,et al.  Federal Communications Commission , 2004, Bell Labs Technical Journal.

[78]  Hyong S. Kim,et al.  QoS provisioning in cellular networks based on mobility prediction techniques , 2003, IEEE Commun. Mag..

[79]  Ian T. Foster,et al.  Mapping the Gnutella Network , 2002, IEEE Internet Comput..

[80]  Technical Specification Group Services and System Aspects ; 3 G Security ; Specification of the MILENAGE Algorithm Set : An example algorithm set for the 3 GPP authentication and key generation functions , 2001 .

[81]  Thomas A. Hockin Government in Canada , 1975 .