Empirical analysis of classifiers and feature selection techniques on mobile phone data activities

Mobile phones nowadays become ubiquitous device and not only a device to facilitate communication, with some addition feature of hardware and software.There are many activities can be captured using mobile phone with many of features.However, not all of these features could benefit to the in processing and analyzer.The large number of features, in some cases, gives less accuracy influence the result. In the same time, a large feature takes requires longer time to build model. This paper aims to analyze accuracy impact of selected feature selection techniques and classifiers that taken on mobile phone activity data and evaluate the method. Furthermore, with use feature selection and discussed emphasis on accuracy impact on classified data of respective classifier, usage of features can be determined. To find the suitable combination between the classifier and the feature selection sometime is crucial. A series of tests conducted in Weka on the accuracy on feature selection shows a consistency on the results although with different order of features.The result found that combination of K* algorithm and correlation feature selection is the best combination with high accuracy rate and in the same time produce less feature subset.

[1]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[2]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[3]  Shubhamoy Dey,et al.  A comparative study of feature selection and machine learning techniques for sentiment analysis , 2012, RACS.

[4]  Bernhard Schölkopf,et al.  Gene Expression Analysis: Joint Feature Selection and Classifier Design , 2004 .

[5]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[6]  Paul Mineiro,et al.  Machine learning on Big Data , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[7]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[8]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[9]  Deborah Estrin,et al.  Impact of network density on data aggregation in wireless sensor networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Geert Wets,et al.  Locational choice modelling using fuzzy decision tables , 1996, Proceedings of North American Fuzzy Information Processing.

[12]  A. Jain,et al.  Security Solutions for Wireless Sensor Networks , 2012, 2012 Second International Conference on Advanced Computing & Communication Technologies.

[13]  Rasmus Pagh,et al.  Consistent Subset Sampling , 2014, SWAT.

[14]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[15]  Narasimhan Sundararajan,et al.  A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation , 2005, IEEE Transactions on Neural Networks.

[16]  L. Carin,et al.  Gene expression analysis : Joint feature selection and classifier design , 2004 .

[17]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[18]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[20]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Taghi M. Khoshgoftaar,et al.  An empirical investigation of filter attribute selection techniques for software quality classification , 2009, 2009 IEEE International Conference on Information Reuse & Integration.

[23]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[24]  Enen Ren,et al.  Comparative study of two uncertain support vector machines , 2012, 2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI).

[25]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[26]  Gary M. Weiss,et al.  Activity recognition using cell phone accelerometers , 2011, SKDD.