A public data set of spatio-temporal match events in soccer competitions

AbstractSoccer analytics is attracting increasing interest in academia and industry, thanks to the availability of sensing technologies that provide high-fidelity data streams for every match. Unfortunately, these detailed data are owned by specialized companies and hence are rarely publicly available for scientific research. To fill this gap, this paper describes the largest open collection of soccer-logs ever released, containing all the spatio-temporal events (passes, shots, fouls, etc.) that occured during each match for an entire season of seven prominent soccer competitions. Each match event contains information about its position, time, outcome, player and characteristics. The nature of team sports like soccer, halfway between the abstraction of a game and the reality of complex social systems, combined with the unique size and composition of this dataset, provide an ideal ground for tackling a wide range of data science problems, including the measurement and evaluation of performance, both at individual and at collective level, and the determinants of success and failure.Measurement(s)physical activityTechnology Type(s)visual observation methodSample Characteristic - LocationGermany • Kingdom of Spain • French Republic • Italy • England Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.9711164

[1]  C. Reep,et al.  Skill and Chance in Association Football , 1968 .

[2]  Jordi Luque,et al.  Using Network Science to Analyse Football Passing Networks: Dynamics, Space, Time, and the Multilayer Nature of the Game , 2018, Front. Psychol..

[3]  Matthew Kerr,et al.  Developing a Data-Driven Player Ranking in Soccer Using Predictive Model Weights , 2016, KDD.

[4]  J. Duch,et al.  Quantifying the Performance of Individual Players in a Team Activity , 2010, PloS one.

[5]  D. Araújo,et al.  Networks as a novel tool for studying team ball sports as complex social systems. , 2011, Journal of science and medicine in sport.

[6]  Ermanno Rampinini,et al.  Goal scoring patterns in major European soccer leagues , 2013, Sport Sciences for Health.

[7]  Daniel Link,et al.  Individual ball possession in soccer , 2017, PloS one.

[8]  Luke Bornn,et al.  Soccer analytics: Unravelling the complexity of “the beautiful game” , 2018, Significance.

[9]  Sridha Sridharan,et al.  Large-Scale Analysis of Soccer Matches Using Spatiotemporal Tracking Data , 2014, 2014 IEEE International Conference on Data Mining.

[10]  Nic James,et al.  Possession as a performance indicator in soccer. , 2004 .

[11]  Luca Pappalardo,et al.  Effective injury forecasting in soccer with GPS training data and machine learning , 2017, PloS one.

[12]  Dino Pedreschi,et al.  The harsh rule of the goals: Data-driven performance indicators for football teams , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[13]  Daniel Memmert,et al.  Big data and tactical analysis in elite soccer: future challenges and opportunities for sports science , 2016, SpringerPlus.

[14]  Christopher J. Anderson,et al.  The numbers game: why everything you know about football is wrong , 2014 .

[15]  Filipe Manuel Clemente,et al.  General network analysis of national soccer teams in FIFA World Cup 2014 , 2015 .

[16]  Qing Wang,et al.  Discerning Tactical Patterns for Professional Soccer Teams: An Enhanced Topic Model with Applications , 2015, KDD.

[17]  Rita Francisco,et al.  The Predictive Value of Dyadic Coping in the Explanation of PTSD Symptoms and Subjective Well-Being of Work Accident Victims , 2018, Front. Psychol..

[18]  Jesse Davis,et al.  Automatic Discovery of Tactics in Spatio-Temporal Soccer Match Data , 2018, KDD.

[19]  L. Bornn,et al.  Wide Open Spaces: A statistical technique for measuring space creation in professional soccer , 2018 .

[20]  Sridha Sridharan,et al.  Large-Scale Analysis of Formations in Soccer , 2013, 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[21]  Dino Pedreschi,et al.  Human Perception of Performance , 2017, ArXiv.

[22]  Luca Pappalardo,et al.  Quantifying the relation between performance and success in soccer , 2017, 1705.00885.

[23]  Albert-László Barabási,et al.  Untangling performance from success , 2015, EPJ Data Science.

[24]  Luca Pappalardo,et al.  Network-based Measures for Predicting the Outcomes of Football Games , 2015, MLSA@PKDD/ECML.

[25]  Emanuele Massucco,et al.  Soccer match event dataset , 2020 .

[26]  Keith Davids,et al.  Network analysis and intra-team activity in attacking phases of professional football , 2014 .

[27]  Dino Pedreschi,et al.  PlayeRank , 2018, ACM Trans. Intell. Syst. Technol..

[28]  Joachim Gudmundsson,et al.  Spatio-Temporal Analysis of Team Sports , 2016, ACM Comput. Surv..

[29]  Dino Pedreschi,et al.  "Engine Matters": A First Large Scale Data Driven Study on Cyclists' Performance , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[30]  Daniel Memmert,et al.  Game management, context effects, and calibration: the case of yellow cards in soccer. , 2008, Journal of sport & exercise psychology.