UNDERSTANDING THE APPLICABILITY OF LINEAR & NON-LINEAR MODELS USING A CASE-BASED STUDY

This paper uses a case based study – “product sales estimation” on real-time data to help us understand the applicability of linear and non-linear models in machine learning and data mining. A systematic approach has been used here to address the given problem statement of sales estimation for a particular set of products in multiple categories by applying both linear and non-linear machine learning techniques on a data set of selected features from the original data set. Feature selection is a process that reduces the dimensionality of the data set by excluding those features which contribute minimal to the prediction of the dependent variable. The next step in this process is training the model that is done using multiple techniques from linear & non-linear domains, one of the best ones in their respective areas. Data Remodeling has then been done to extract new features from the data set by changing the structure of the dataset & the performance of the models is checked again. Data Remodeling often plays a very crucial and important role in boosting classifier accuracies by changing the properties of the given dataset. We then try to explore and analyze the various reasons due to which one model performs better than the other & hence try and develop an understanding about the applicability of linear & non-linear machine learning models. The target mentioned above being our primary goal, we also aim to find the classifier with the best possible accuracy for product sales estimation in the given scenario.

[1]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Christopher Rao,et al.  Graphs in Statistical Analysis , 2010 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[5]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[6]  Jan Ivar Larsen,et al.  Predicting Stock Prices Using Technical Analysis and Machine Learning , 2010 .

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  A. Hanks Canada , 2002 .

[9]  M. Iorio,et al.  A semi-automatic method to guide the choice of ridge parameter in ridge regression , 2012, 1205.0686.

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  Sohail Chand,et al.  On tuning parameter selection of lasso-type methods - a monte carlo study , 2012, Proceedings of 2012 9th International Bhurban Conference on Applied Sciences & Technology (IBCAST).

[12]  Muhammad Badruddin Khan,et al.  Machine Learning: Algorithms and Applications , 2016 .