Mutual information and sensitivity analysis for feature selection in customer targeting: A comparative study

Feature selection is a highly relevant task in any data-driven knowledge discovery project. The present research focuses on analysing the advantages and disadvantages of using mutual information (MI) and data-based sensitivity analysis (DSA) for feature selection in classification problems, by applying both to a bank telemarketing case. A logistic regression model is built on the tuned set of features identified by each of the two techniques as the most influencing set of features on the success of a telemarketing contact, in a total of 13 features for MI and 9 for DSA. The latter performs better for lower values of false positives while the former is slightly better for a higher false-positive ratio. Thus, MI becomes a better choice if the intention is reducing slightly the cost of contacts without risking losing a high number of successes. However, DSA achieved good prediction results with less features.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[3]  A. Saltelli,et al.  Importance measures in global sensitivity analysis of nonlinear models , 1996 .

[4]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[5]  Robert H. Kewley,et al.  Data strip mining for the virtual design of pharmaceuticals with neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6]  Robert H. Kewley,et al.  Data Mining for Molecules with 2-D Neural Network Sensitivity Analysis , 2003 .

[7]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[8]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  William Nick Street,et al.  An intelligent system for customer targeting: a data mining approach , 2004, Decis. Support Syst..

[11]  Xiaohui Liu,et al.  The contribution of data mining to information science , 2004, J. Inf. Sci..

[12]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  J. Goldenberg,et al.  The NPV of bad news , 2007 .

[14]  Keith A. Richards,et al.  Customer relationship management: Finding value drivers , 2008 .

[15]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[16]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[17]  Paulo Cortez,et al.  Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool , 2010, ICDM.

[18]  Frits C. R. Spieksma,et al.  Optimization models for targeted offers in direct marketing: Exact and heuristic algorithms , 2011, Eur. J. Oper. Res..

[19]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[20]  Yuanyuan Li,et al.  Feature selection based on sensitivity analysis of fuzzy ISODATA , 2012, Neurocomputing.

[21]  Paulo Cortez,et al.  Using sensitivity analysis and visualization techniques to open black box data mining models , 2013, Inf. Sci..

[22]  William Yeoh,et al.  The Impact of Feature Selection: a Data-Mining Application in Direct Marketing , 2013, Intell. Syst. Account. Finance Manag..

[23]  Jugal K. Kalita,et al.  MIFS-ND: A mutual information-based feature selection method , 2014, Expert Syst. Appl..

[24]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[25]  Joaquim Agostinho Barbosa Tinoco,et al.  A novel approach to predicting young’s modulus of jet grouting laboratory formulations over time using data mining techniques , 2014 .

[26]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[27]  Bart Baesens,et al.  Profit optimizing customer churn prediction with Bayesian network classifiers , 2014, Intell. Data Anal..

[28]  Paulo Cortez,et al.  A data-driven approach to predict the success of bank telemarketing , 2014, Decis. Support Syst..

[29]  Paulo Cortez,et al.  Using customer lifetime value and neural networks to improve the prediction of bank deposit subscription in telemarketing campaigns , 2014, Neural Computing and Applications.

[30]  Paulo Cortez,et al.  A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel features , 2017, Neural Computing and Applications.

[31]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[32]  Rafael Romero-Meza,et al.  Nonlinearities and financial contagion in Latin American stock markets , 2015 .

[33]  Paulo Rita,et al.  Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach , 2016 .

[34]  Azadeh Shakery,et al.  A language-model-based approach for subjectivity detection , 2017, J. Inf. Sci..

[35]  Hossam Faris,et al.  Feature engineering for detecting spammers on Twitter: Modelling and analysis , 2018, J. Inf. Sci..