The big data analysis of rail equipment accidents based on the maximal information coefficient

Abstract With more electrical and electronic equipment applied into the railway system, much more data can be collected and then the big data era of railway is coming. By employing the maximal information coefficient (MIC), the big data analysis of rail equipment accidents is studied to investigate the effect of the updating of rail equipment. The rail equipment accident data set of 25 years (from 1990 to 2014) is separated into three subsets corresponding to the period of the occurrence time of accidents. For every subset, the contributing factors to accident damage, to accident severity, and to accident cause are analyzed, respectively. The results show that the variation trend of the number of rail equipment accidents is more consistent with the variety of railroad service miles rather than carloads. And the factor of highway-rail grade crossings is an important one which accords with the facts. However, a seemingly surprising result is found that there will be more contributing factors to accident severity and to accident causes with more equipment applied into the railway system as time goes on.

[1]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[2]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[3]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[4]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  C. Barkan,et al.  Derailment Probability Analyses and Modeling of Mainline Freight Trains , 2005 .

[6]  Pedro Delicado,et al.  Measuring non-linear dependence for two random variables distributed along a curve , 2009, Stat. Comput..

[7]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[8]  Md. Mazharul Haque,et al.  Empirical Evaluation of Alternative Approaches in Identifying Crash Hot Spots , 2009 .

[9]  Aemal J. Khattak,et al.  Pedestrian and Bicyclist Violations at Highway–Rail Grade Crossings , 2011 .

[10]  G. He,et al.  An accurate active set newton algorithm for large scale bound constrained optimization , 2011 .

[11]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[12]  Liping Fu,et al.  Modeling hazardous materials risks for different train make-up plans , 2012 .

[13]  C. Barkan,et al.  Analysis of Major Derailment Causes on Heavy Haul Railways in the United States , 2012 .

[14]  Chi-Kang Lee,et al.  Model crash frequency at highway–railroad grade crossings using negative binomial regression , 2012 .

[15]  Han Liu,et al.  Statistical analysis of big data on pharmacogenomics. , 2013, Advanced drug delivery reviews.

[16]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[17]  Cesare Furlanello,et al.  minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers , 2012, Bioinform..

[18]  Xiang Liu,et al.  Accident Analysis and Prevention Analysis of U.s. Freight-train Derailment Severity Using Zero-truncated Negative Binomial Regression and Quantile Regression , 2022 .

[19]  R. Stepanauskas,et al.  Productivity and salinity structuring of the microplankton revealed by comparative freshwater metagenomics , 2013, Environmental microbiology.

[20]  G. He,et al.  Parallel algorithms for large-scale linearly constrained minimization problem , 2014, Acta Mathematicae Applicatae Sinica, English Series.

[21]  Patrick Waterson,et al.  Systems thinking, the Swiss Cheese Model and accident analysis: a comparative systemic analysis of the Grayrigg train derailment using the ATSB, AcciMap and STAMP models. , 2014, Accident; analysis and prevention.

[22]  Keping Li,et al.  Detecting novel multi-variable associations in big data based on MIC , 2015, 2015 IEEE 5th International Conference on Electronics Information and Emergency Communication.

[23]  Mohamed Abdel-Aty,et al.  Multi-level hot zone identification for pedestrian safety. , 2015, Accident; analysis and prevention.

[24]  Kirolos Haleem,et al.  Contributing factors of crash injury severity at public highway-railroad grade crossings in the U.S. , 2015, Journal of safety research.

[25]  Andry Rakotonirainy,et al.  Mistakes or deliberate violations? A study into the origins of rule breaking at pedestrian train crossings. , 2015, Accident; analysis and prevention.

[26]  Aemal Khattak,et al.  Motor vehicle drivers' injuries in train-motor vehicle crashes. , 2015, Accident; analysis and prevention.

[27]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[28]  Keping Li,et al.  Railway accidents analysis based on the improved algorithm of the maximal information coefficient , 2016, Intell. Data Anal..

[29]  Asad J. Khattak,et al.  Non-Crossing Rail-Trespassing Crashes in the Past Decade: A Spatial Approach to Analyzing Injury Severity , 2016 .

[30]  Keping Li,et al.  Identifying multi-variable relationships based on the maximal information coefficient , 2017, Intell. Data Anal..