Images meant for marketing and promotional purposes (i.e. coupons) represent a basic component in incentivizing customers to visit shopping outlets and purchase discounted commodities. They also help department stores in attracting more customers and potentially, speeding up their cash flow. While coupons are available from various sources - print, web, etc. categorizing these monetary instruments is a benefit to the users. We are interested in an automatic categorizer system that aggregates these coupons from different sources (web, digital coupons, paper coupons, etc) and assigns a type to each of these coupons in an efficient manner. While there are several dimensions to this problem, in this paper we study the problem of accurately categorizing/classifying the coupons. We propose and evaluate four different techniques for categorizing the coupons namely, word-based model, n-gram-based model, externally weighing model, weight decaying model which take advantage of known machine learning algorithms. We evaluate these techniques and they achieve high accuracies in the range of 73.1% to 93.2%. We provide various examples of accuracy optimizations that can be performed and show a progressive increase in categorization accuracy for our test dataset.
[1]
Yuan Qi,et al.
Mining roles with noisy data
,
2010,
SACMAT '10.
[2]
Ilya Zavorin,et al.
A filter based post-OCR accuracy boost system
,
2004,
HDP '04.
[3]
Thomas G. Dietterich.
What is machine learning?
,
2020,
Archives of Disease in Childhood.
[4]
Harris Drucker,et al.
Support vector machines for spam categorization
,
1999,
IEEE Trans. Neural Networks.
[5]
Manoranjan Dash,et al.
Mining in Large Noisy Domains
,
2009,
JDIQ.
[6]
Lipika Dey,et al.
Opinion mining from noisy text data
,
2009,
International Journal on Document Analysis and Recognition (IJDAR).