Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset

The area of supervised machine learning often encounters imbalanced class distribution problem where one class is under represented as compared to other classes. Additionally, in many real-life problem domains, data with an imbalanced class distribution contains ambiguous regions in the data space where the prior probability of two or more classes are approximately equal. This problem, known as overlapping classes, thus makes it difficult for the learners in classification task. In this chapter, intersection between the problems of imbalanced class and overlapping classes is explored from the perspective of Smart Environments as the application domain. In smart environments, the task of delivering in-home interventions to residents for timely reminders or brief instructions to ensure successful completion of daily activities, is an ideal scenario for the problem. As a solution to the aforementioned problem, a novel clustering-based under-sampling (ClusBUS) technique is proposed. Density-based clustering technique, DBSCAN, is used to identify “interesting” clusters in the instance space on which under-sampling is performed on the basis of a threshold value for degree of minority class dominance in the clusters.

[1]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[2]  J. Bates,et al.  Psychosocial interventions for people with a milder dementing illness: a systematic review. , 2004, Journal of advanced nursing.

[3]  Gustavo E. A. P. A. Batista,et al.  Balancing Strategies and Class Overlapping , 2005, IDA.

[4]  Gary Weiss,et al.  Does cost-sensitive learning beat sampling for classifying rare classes? , 2005, UBDM '05.

[5]  Thomas P. Trappenberg,et al.  Using SVM for classification in datasets with ambiguous data , 2002 .

[6]  Zhi-Hua Zhou,et al.  The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[8]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[9]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[10]  Diane J. Cook,et al.  Automated Prompting in a Smart Home Environment , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  José Martínez Sotoca,et al.  When Overlapping Unexpectedly Alters the Class Imbalance Effects , 2007, IbPRIA.

[13]  Diane J. Cook,et al.  Recognizing independent and joint activities among multiple residents in smart environments , 2010, J. Ambient Intell. Humaniz. Comput..

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[16]  Thomas Hofmann,et al.  Learning from ambiguous examples , 2007 .

[17]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[18]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[19]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[20]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[21]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[22]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[23]  Diane J. Cook,et al.  An Automated Prompting System for Smart Environments , 2011, ICOST.

[24]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[25]  Naoki Wakamiya,et al.  Theme issue on “Sensor-driven computing and applications for Ambient Intelligence” , 2011, Personal and Ubiquitous Computing.

[26]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[27]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[28]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[29]  Yaohua Tang,et al.  Improved Classification for Problem Involving Overlapping Patterns , 2007, IEICE Trans. Inf. Syst..

[30]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[31]  Cheng-Lin Liu Partial discriminative training for classification of overlapping classes in document analysis , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[32]  Thomas P. Trappenberg,et al.  A classification scheme for applications with ambiguous data , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[33]  Misha Denil,et al.  Overlap versus Imbalance , 2010, Canadian Conference on AI.

[34]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[35]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[36]  Junjie Wu,et al.  Classification with ClassOverlapping: A Systematic Study , 2010, ICE-B 2010.

[37]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[38]  Xuan Wang,et al.  Sphere Classification for Ambiguous Data , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[39]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[40]  José Martínez Sotoca,et al.  Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification , 2006, IDEAL.

[41]  Ana L. C. Bazzan,et al.  Balancing Training Data for Automated Annotation of Keywords: a Case Study , 2003, WOB.

[42]  Daniel P. Siewiorek,et al.  Activity recognition and monitoring using multiple sensors on different body positions , 2006, International Workshop on Wearable and Implantable Body Sensor Networks (BSN'06).

[43]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[44]  O. Okonkwo,et al.  Mild cognitive impairment and everyday function: evidence of reduced speed in performing instrumental activities of daily living. , 2008, The American journal of geriatric psychiatry : official journal of the American Association for Geriatric Psychiatry.

[45]  Kent Larson,et al.  Activity Recognition in the Home Using Simple and Ubiquitous Sensors , 2004, Pervasive.

[46]  Lu Liu,et al.  Classification with ClassOverlapping: A Systematic Study , 2010, ICE-B 2010.

[47]  Diane J Cook,et al.  Tracking Activities in Complex Settings Using Smart Environment Technologies. , 2009, International journal of biosciences, psychiatry, and technology.

[48]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[49]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.