Large-Scale Data Analysis Using Heuristic Methods

Estimation and modelling problems as they arise in many data analysis areas often turn out to be unstable and/or intractable by standard numerical methods. Such problems frequently occur in fitting of large data sets to a certain model and in predictive learning. Heuristics are general recommendations based on practical statistical evidence, in contrast to a fixed set of rules that cannot vary, although guarantee to give the correct answer. Although the use of these methods became more standard in several fields of sciences, their use for estimation and modelling in statistics appears to be still limited. This paper surveys a set of problem-solving strategies, guided by heuristic information, that are expected to be used more frequently. The use of recent advances in different fields of large-scale data analysis is promoted focusing on applications in medicine, biology and technology.

[1]  Karel Skokan Technological and Economic Development of Economy , 2011 .

[2]  Vítezslav Veselý,et al.  Change Point Detection by Sparse Parameter Estimation , 2011, Informatica.

[3]  Y. Guermeur Sample Complexity of Classifiers Taking Values in ℝ Q , Application to Multi-Class SVMs , 2010 .

[4]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[5]  Adilson Elias Xavier,et al.  The hyperbolic smoothing clustering method , 2010, Pattern Recognit..

[6]  Vaida Bartkute-Norkuniene,et al.  Stochastic Optimization Algorithms for Support Vector Machines Classification , 2009, Informatica.

[7]  Alessandra Durio,et al.  The Minimum Density Power Divergence Approach in Building Robust Regression Models , 2011, Informatica.

[8]  Gintautas Dzemyda,et al.  Conditions for Optimal Efficiency of Relative MDS , 2007, Informatica.

[9]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[10]  Kestutis Kubilius,et al.  On Comparison of the Estimators of the Hurst Index of the Solutions of Stochastic Differential Equations Driven by the Fractional Brownian Motion , 2011, Informatica.

[11]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[12]  E. Mathieu,et al.  Parametric and Non Homogeneous Semi-Markov Process for HIV Control , 2007 .

[13]  Gintautas Dzemyda,et al.  Dependence of locally linear embedding on the regularization parameter , 2010 .

[14]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[15]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[16]  Peter H. Millard,et al.  Phase-Type Survival Trees and Mixed Distribution Survival Trees for Clustering Patients' Hospital Length of Stay , 2011, Informatica.

[17]  Gintautas Dzemyda,et al.  Heuristic approach for minimizing the projection error in the integrated mapping , 2006, Eur. J. Oper. Res..

[18]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[19]  Edmundas Kazimieras Zavadskas,et al.  Editorial: Optimization and intelligent decisions , 2009 .

[20]  Olga Kurasova,et al.  Quality of Quantization and Visualization of Vectors Obtained by Neural Gas and Self-Organizing Map , 2011, Informatica.

[21]  Egidijus Rytas Vaidogas,et al.  Protecting built property against fire disasters: Multi ‐attribute decision making with respect to fire risk , 2010 .

[22]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[23]  Edmundas Kazimieras Zavadskas,et al.  Integrated knowledge management model and system for construction projects , 2010, Eng. Appl. Artif. Intell..

[24]  Kwan-Liu Ma,et al.  An Interface Design for Future Cloud-Based Visualization Services , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[25]  Gintautas Dzemyda,et al.  Web Application for Large-Scale Multidimensional Data Visualization , 2011 .

[26]  Stochastic Optimization Algorithms for Support Vector Machines Classification , 2009 .

[27]  Edmundas Kazimieras Zavadskas,et al.  Multicriteria evaluation of apartment blocks maintenance contractors: Lithuanian case study , 2009 .

[28]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[29]  Dr. Zbigniew Michalewicz,et al.  How to Solve It: Modern Heuristics , 2004 .

[30]  Yoonsuck Choe,et al.  Fast and accurate retinal vasculature tracing and kernel-Isomap-based feature selection , 2009, 2009 International Joint Conference on Neural Networks.

[31]  Leonidas Sakalauskas,et al.  Heuristic and stochastic methods in optimization , 2006, Eur. J. Oper. Res..

[32]  Rimantas Rudzkis,et al.  Statistical Classification of Scientific Publications , 2010, Informatica.

[33]  Fernando Y. Chiyoshi,et al.  A statistical analysis of simulated annealing applied to the p-median problem , 2000, Ann. Oper. Res..

[34]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[35]  S. Morgenthaler Robustness in Statistics , 2001 .

[36]  Gintautas Dzemyda,et al.  Large Datasets Visualization with Neural Network Using Clustered Training Data , 2008, ADBIS.

[37]  Kwok Yip Szeto,et al.  Community Detection Through Optimal Density Contrast of Adjacency Matrix , 2011, Informatica.

[38]  Jacob Zahavi,et al.  Using simulated annealing to optimize the feature selection problem in marketing applications , 2006, Eur. J. Oper. Res..

[39]  Raimondo Manca,et al.  HIV Evolution: A Quantification of the Effects Due to Age and to Medical Progress , 2011, Informatica.

[40]  Jack P. C. Kleijnen,et al.  Response surface methodology's steepest ascent and step size revisited , 2004, Eur. J. Oper. Res..

[41]  Alan Jessop An optimising approach to alternative clustering schemes , 2010, Central Eur. J. Oper. Res..

[42]  Emmanuel Monfrini,et al.  A Quadratic Loss Multi-Class SVM for which a Radius-Margin Bound Applies , 2011, Informatica.

[43]  Adil M. Bagirov,et al.  Modified global k-means algorithm for minimum sum-of-squares clustering problems , 2008, Pattern Recognit..

[44]  Edmundas K. Zavadskas,et al.  Multiattribute Selection from Alternative Designs of Infrastructure Components for Accidental Situations , 2009, Comput. Aided Civ. Infrastructure Eng..

[45]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[46]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[47]  M. Brusco,et al.  Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis , 2009 .

[48]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[49]  Gintautas Dzemyda,et al.  Topology Preservation Measures in the Visualization of Manifold-Type Multidimensional Data , 2009, Informatica.

[50]  Gintautas Dzemyda,et al.  Optimization of the Local Search in the Training for SAMANN Neural Network , 2006, J. Glob. Optim..

[51]  Michael W. Trosset Multidimensional Scaling Algorithms for Large Data Sets , 2005 .

[52]  Edmundas Kazimieras Zavadskas,et al.  Model for a Complex Analysis of Intelligent Built Environment , 2010 .

[53]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[54]  Teuvo Kohonen,et al.  Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[55]  Tamara Munzner,et al.  Steerable, Progressive Multidimensional Scaling , 2004, IEEE Symposium on Information Visualization.

[56]  Jacques Janssen,et al.  Numerical Treatment of Homogeneous Semi-Markov Processes in Transient Case–a Straightforward Approach , 2004 .

[57]  Mostafa El Qannari,et al.  From Multiblock Partial Least Squares to Multiblock Redundancy Analysis. A Continuum Approach , 2011, Informatica.

[58]  Robin Nunkesser,et al.  An evolutionary algorithm for robust regression , 2010, Comput. Stat. Data Anal..

[59]  Gintautas Dzemyda,et al.  Visualization of a set of parameters characterized by their correlation matrix , 2001 .

[60]  Peter Winker,et al.  Applications of optimization heuristics to estimation and modelling problems , 2004, Comput. Stat. Data Anal..

[61]  A. Naud,et al.  Visualization of high-dimensional data using an association of multidimensional scaling to clustering , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[62]  Stefano Benati,et al.  Heuristic methods for the optimal statistic median problem , 2011, Comput. Oper. Res..

[63]  R. Marshall 5. Multidimensional Scaling. 2nd edn. Trevor F. Cox and Michael A. A. Cox, Chapman & Hall/CRC, Boca Raton, London, New York, Washington DC, 2000. No. of pages: xiv + 309. Price: $79.95. ISBN 1‐58488‐094‐5 , 2002 .

[64]  Gintautas Dzemyda,et al.  DIAGONAL MAJORIZATION ALGORITHM: PROPERTIES AND EFFICIENCY , 2007 .