Maintaining imbalance highly dependent medical data using dirichlet process data generation

The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, which increase the size of datasets. In this research, multinomial classification was applied to classify some recorded features obtained from a single ECG (electrocardiograph) sensor. Therefore, a Dirichlet process, a dirichlet distribution of cumulative distribution function of each data partition, was needed to model the distribution of the new generated data by also considering the statistical properties of the previous data. Data balancing process had given the result of 77.21% classification accuracy (CA), and 90.9% area under ROC curve (AUC).

[1]  S. Sitharama Iyengar,et al.  Medical Datamining with a New Algorithm for Feature Selection and Naive Bayesian Classifier , 2007 .

[2]  Lin Shang,et al.  RoughTree A Classifier with Naive-Bayes and Rough Sets Hybrid in Decision Tree Representation , 2007, 2007 IEEE International Conference on Granular Computing (GRC 2007).

[3]  D. Gunopulos,et al.  Scaling up the Naive Bayesian Classifier : Using Decision Trees for Feature Selection , 2002 .

[4]  Albert Sutojo,et al.  Concept Mining using Association Rules and Combinatorial Topology , 2007 .

[5]  J. Strackee,et al.  Comparing Spectra of a Series of Point Events Particularly for Heart Rate Variability Data , 1984, IEEE Transactions on Biomedical Engineering.

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Nada Lavrač,et al.  Induction of Decision Trees and Bayesian Classification Applied to Diagnosis of Sport Injuries , 1997, Journal of Medical Systems.

[8]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[9]  Flavio Foschi Artificial data through calibration and empirical copulas , 2010 .

[10]  S. James Press,et al.  International Encyclopedia of Statistics , 1978 .

[11]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[12]  B. Liseo,et al.  Artificial Continuous Data for SDC , 2010 .

[13]  Susmita Sur-Kolay,et al.  Fast Robust Intellectual Property Protection for VLSI Physical Design , 2007 .

[14]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[15]  Olfa Nasraoui,et al.  Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.

[16]  Conor Heneghan,et al.  Heart Rate Variability: Measures and Models , 2000, physics/0008016.

[17]  G. Breithardt,et al.  Heart rate variability: standards of measurement, physiological interpretation and clinical use. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. , 1996 .

[18]  A. Malliani,et al.  Heart rate variability. Standards of measurement, physiological interpretation, and clinical use , 1996 .

[19]  Brian Litt,et al.  Evolving a Bayesian classifier for ECG-based age classification in medical applications , 2008, Appl. Soft Comput..

[20]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[21]  S. S. Iyengar,et al.  Medical Datamining with a New Algorithm for Feature Selection and Naive Bayesian Classifier , 2007, 10th International Conference on Information Technology (ICIT 2007).

[22]  Hooman Tahayori,et al.  RoughTree A Classifier with Naive-Bayes and Rough Sets Hybrid in Decision Tree Representation , 2007 .

[23]  Conor Heneghan,et al.  Automatic sleep apnoea detection using measures of amplitude and heart rate variability from the electrocardiogram , 2002, Object recognition supported by user interaction for service robots.

[24]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[25]  Mong-Li Lee,et al.  SNNB: A Selective Neighborhood Based Naïve Bayes for Lazy Learning , 2002, PAKDD.