Text based machine learning using discriminative classifiers

Ever since the invention of computer, a curiosity exists to see if it can be made to learn. If humans could understand how to program them and learn to improve automatically with experience, the impact would be dramatic. A successful understanding of how to make computers learn would open up many new uses of computers and new levels of competence and customization. In this paper, two applications of Machine Learning are explored. In the first one, linear regression to understand the correlation of the feature columns with the output and make predictions based on the “line of best fit” is given. In the second one, discriminative classifiers for analyzing and segregating text-based data is proposed. On applying regression analysis on advertising data, it is observed that TV advertising has the strongest linear correlation with sales. In the later section, text-based machine learning is employed using the scikit-learn library of Python. Multiple contemporary classifiers are applied on a set of SMS’s to perform spam detection. The performance of the classifiers is evaluated using suitable accuracy metrics. The results show that the Naive Bayes algorithm is much faster than other algorithms such as Logistic Regression. Using a Bayesian probabilistic approach, a spam ratio is attached to all the tokens in the input set. The proposed work proves to be helpful in the field of advertising and spam detection systems