Nonparametric data reduction approach for large-scale survival data analysis

In the era of big data, analysis of complex and huge data expends time and money, may cause errors and misinterpretations. Consequently, inaccurate and erroneous reasoning could lead to poor inference and decision making, sometimes irreversible and catastrophic events. On the other hand, proper management and utilization of valuable data could significantly increase knowledge and reduce cost by preventive actions. In this field, time-to-event and survival data analysis is a kernel of risk assessment and have an inevitable role in predicting the probability of many events occurrence such as failure of a device or component. Thus, in the presence of large-scale, massive and complex data, specifically in terms of variables, applying proper methods to efficiently simplify such data before any analysis process is desired. In this paper we propose an applied data reduction approach which enables us to obtain appropriate variable selection in high dimensional and large-scale data in order to avoid aforementioned difficulties in decision making and facilitate survival data and failure analysis. This paper present applied data reduction and variable selection approach for risk assessment and decision making in complex large-scale survival data analysis.

[1]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[2]  Will McNally,et al.  Total Quality Management. Three Steps to Continuous Improvement , 1992 .

[3]  P. V. Rao,et al.  Applied Survival Analysis: Regression Modeling of Time to Event Data , 2000 .

[4]  D.,et al.  Regression Models and Life-Tables , 2022 .

[5]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[7]  Elisa Lee,et al.  Statistical Methods for Survival Data Analysis: Lee/Survival Data Analysis , 2003 .

[8]  Ck Cheng,et al.  The Age of Big Data , 2015 .

[9]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Theodore R. Holford,et al.  Multivariate Methods in Epidemiology , 2002 .

[12]  Udaya B. Kogalur,et al.  Random Survival Forests for R , 2007 .

[13]  David W. Hosmer,et al.  Applied Survival Analysis: Regression Modeling of Time-to-Event Data , 2008 .

[14]  James Nga-Kwok Liu,et al.  Application of decision-making techniques in supplier selection: A systematic review of literature , 2013, Expert Syst. Appl..

[15]  Markus Hammer,et al.  How big data can improve manufacturing , 2022 .

[16]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[17]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[18]  D. Garvin Competing on the Eight Dimensions of Quality , 1987 .

[19]  W. Velicer,et al.  Comparison of five rules for determining the number of components to retain. , 1986 .

[20]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[21]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[22]  Jan J. Gerbrands,et al.  On the relationships between SVD, KLT and PCA , 1981, Pattern Recognit..

[23]  J. Manyika,et al.  Are you ready for the era of ‘big data’? , 2010 .

[24]  Fei Ge,et al.  An Information Criterion for Informative Gene Selection , 2005, ISNN.

[25]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[26]  Keivan Sadeghzadeh,et al.  Mathematical analysis of fuel cell strategic technologies development solutions in the automotive industry by the TOPSIS multi-criteria decision making method , 2011 .

[27]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[28]  Antony J. Williams,et al.  Beautiful Data: The Stories Behind Elegant Data Solutions , 2009 .