The role of data reduction for diagnosis of pathologies of the vertebral column by using supervised learning algorithms

Today in data mining research we are daily confronted with large amount of data. Most of the time, these data contain redundant and irrelevant data that it is important to extract before a learning task in order to get good accuracy. The fact that today's computers are more powerful does not solves the problems of this ever-growing data. It is therefore crucial to find techniques which allow handling these large databases often too big to be processed. Data reduction techniques are therefore a very important step to prepare the data before data mining and knowledge discovery. In this paper we present a comparative study on original and reduced data to see the role data reduction in a learning task. For this purpose, we used a medical dataset; especially a vertebral column pathologies database.

[1]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[2]  Jaime S. Cardoso,et al.  Diagnostic of Pathology on the Vertebral Column with Embedded Reject Option , 2011, IbPRIA.

[3]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[4]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[5]  Sally Jo Cunningham,et al.  Dataset cataloging metadata for machine learning applications and research , 1996 .

[6]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[7]  Nada Lavrac,et al.  Conditions for Occam's Razor Applicability and Noise Elimination , 1997, ECML.

[8]  Oguzhan Alagoz,et al.  Informatics in radiology: comparison of logistic regression and artificial neural network models in breast cancer risk estimation. , 2010, Radiographics : a review publication of the Radiological Society of North America, Inc.

[9]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[10]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[12]  D. G. Bounds,et al.  A multilayer perceptron network for the diagnosis of low back pain , 1988, IEEE 1988 International Conference on Neural Networks.

[13]  G.A. Barreto,et al.  On the Application of Ensembles of Classifiers to the Diagnosis of Pathologies of the Vertebral Column: A Comparative Analysis , 2009, IEEE Latin America Transactions.

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Georgia D. Tourassi,et al.  Impact of missing data in evaluating artificial neural networks trained on complete data , 2006, Comput. Biol. Medicine.

[16]  Wei Fan,et al.  Bagging , 2009, Encyclopedia of Machine Learning.

[17]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[18]  Myung Won Kim,et al.  Design rules of multilayer perceptrons , 1992, Defense, Security, and Sensing.

[19]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[20]  J. Dayhoff,et al.  Artificial neural networks , 2001, Cancer.