Knowledge extraction from web-based consumer surveys: Bayesian networks with feature selection

Large-scale internet surveys have become very popular during the last two decades because of the internet's evolution. Such surveys often contain multiple objective variables, and the relationship between these variables is unknown beforehand. Although various statistical methods are used for marketing analyses, conventional statistical methods are not designed to handle multiple objective variables. This paper proposes a method of extracting useful knowledge from a multi-objective survey dataset by performing Bayesian network modelling, accompanied by feature selection in which Cramer's coefficient of association (Cramer's V) is used as the information index. A marketer as a do-main expert subjectively decides what features to use in Bayesian networks, by firstly referring to the Cramer V ranking of explanatory variables, and by supplementarily referring to the Cramer V values of some combinations of variables. This method aims at not only finding a feature subset that accurately classifies objective variables but also aims to find a feature subset that shows consumers' behaviour knowledge and hence leads to concrete marketing actions. The proposed method was verified by using survey data on health consciousness and private medical insurance.