Pricing fraud detection in online shopping malls using a finite mixture model

Abstract Although pricing fraud is an important issue for improving service quality of online shopping malls, research on automatic fraud detection has been limited. In this paper, we propose an unsupervised learning method based on a finite mixture model to identify pricing frauds. We consider two states, normal and fraud, for each item according to whether an item description is relevant to its price by utilizing the known number of item clusters. Two states of an observed item are modeled as hidden variables, and the proposed models estimate the state by using an expectation maximization (EM) algorithm. Subsequently, we suggest a special case of the proposed model, which is applicable when the number of item clusters is unknown. The experiment results show that the proposed models are more effective in identifying pricing frauds than the existing outlier detection methods. Furthermore, it is presented that utilizing the number of clusters is helpful in facilitating the improvement of pricing fraud detection performances.

[1]  Leonard I. Nakamura The measurement of retail output and the retail revolution , 1997 .

[2]  Rüdiger W. Brause,et al.  Neural data mining for credit card fraud detection , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[3]  Sasha Dekleva Electronic Commerce: A Half-Empty Glass? , 2000, Commun. Assoc. Inf. Syst..

[4]  Xiaohua Hu,et al.  Dragon Toolkit: Incorporating Auto-Learned Semantic Knowledge into Large-Scale Text Retrieval and Mining , 2007 .

[5]  John G. Lynch,et al.  Interactive Home Shopping: Consumer, Retailer, and Manufacturer Incentives to Participate in Electronic Marketplaces , 1997 .

[6]  Liang Zhang,et al.  Online modeling of proactive moderation system for auction fraud detection , 2012, WWW.

[7]  Alfons Juan-Císcar,et al.  On the use of Bernoulli mixture models for text classification , 2001, Pattern Recognit..

[8]  Josef Kittler,et al.  Feature selection based on the approximation of class densities by finite mixtures of special type , 1995, Pattern Recognit..

[9]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[10]  Judy E. Scott,et al.  A typology of complaints about eBay sellers , 2008, CACM.

[11]  Bernd Freisleben,et al.  CARDWATCH: a neural network based database mining system for credit card fraud detection , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[12]  Rong Jin,et al.  A New Pairwise Ensemble Approach for Text Classification , 2003, ECML.

[13]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[14]  Cenk Kocas,et al.  Evolution of Prices in Electronic Markets Under Diffusion of Price-Comparison Shopping , 2002, J. Manag. Inf. Syst..

[15]  Christopher Tucci,et al.  Reducing internet auction fraud , 2008, CACM.

[16]  Wen-Hsi Chang,et al.  A Multiple-Phased Modeling Method to Identify Potential Fraudsters in Online Auctions , 2010, 2010 Second International Conference on Computer Research and Development.

[17]  Robert J. Kauffman,et al.  The effects of shilling on final bid prices in online auctions , 2005, Electron. Commer. Res. Appl..

[18]  Maria L. Gini,et al.  A predictive empirical model for pricing and resource allocation decisions , 2007, ICEC.

[19]  Shi Zhong,et al.  A Comparative Study of Generative Models for Document Clustering , 2003 .

[20]  Rakesh Agrawal,et al.  Ameliorating buyer's remorse , 2011, KDD.

[21]  Volker Tresp,et al.  Fraud detection in communication networks using neural and probabilistic methods , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[22]  Charu C. Aggarwal,et al.  On Abnormality Detection in Spuriously Populated Data Streams , 2005, SDM.

[23]  Lu Liu,et al.  Reputation inflation detection in a Chinese C2C market , 2011, Electron. Commer. Res. Appl..

[24]  Sanjay Ranka,et al.  Gene expression Distance-based clustering of CGH data , 2006 .

[25]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[26]  Philip S. Yu,et al.  Cross-feature analysis for detecting ad-hoc routing anomalies , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[27]  Fang-Fang Tang,et al.  Forthcoming , 2001, Central European History.

[28]  N. Sedgwick,et al.  Noise compensation for speech recognition using probabilistic models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  José R. Dorronsoro,et al.  Neural fraud detection in credit card operations , 1997, IEEE Trans. Neural Networks.

[30]  D. Hand,et al.  Unsupervised Profiling Methods for Fraud Detection , 2002 .

[31]  Cecil Eng Huang Chua,et al.  Fighting Internet auction fraud: an assessment and proposal , 2004, Computer.

[32]  Andrew B. Whinston,et al.  Building Trust in Online Auction Markets Through an Economic Incentive Mechanism , 2003, Decis. Support Syst..

[33]  Heinz-Otto Peitgen,et al.  A Comprehensive Approach to the Analysis of Contrast Enhanced Cardiac MR Images , 2008, IEEE Transactions on Medical Imaging.

[34]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[35]  Donna L. Hoffman,et al.  Building consumer trust online , 1999, CACM.

[36]  Christos Faloutsos,et al.  Toward a Comprehensive Model in Internet Auction Fraud Detection , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[37]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[38]  Michael P. H. Stumpf,et al.  Which species is it? Species-driven gene name disambiguation using random walks over a mixture of adjacency matrices , 2012, Bioinform..

[39]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[40]  Yi Hu,et al.  Design and Analysis of Techniques for Detection of Malicious Activities in Database Systems , 2005, Journal of Network and Systems Management.

[41]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[42]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[43]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[44]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[45]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[46]  Adam Rifkin,et al.  Nutch: A Flexible and Scalable Open-Source Web Search Engine , 2005 .

[47]  Mert R. Sabuncu,et al.  A Generative Model for Image Segmentation Based on Label Fusion , 2010, IEEE Transactions on Medical Imaging.

[48]  David D. Lewis,et al.  A sequential algorithm for training text classifiers: corrigendum and additional data , 1995, SIGF.

[49]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[50]  J. Bakos Reducing buyer search costs: implications for electronic marketplaces , 1997 .

[51]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[52]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[53]  Djamel Bouchaffra,et al.  Genetic-based EM algorithm for learning Gaussian mixture models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Minqiang Li,et al.  Multinomial mixture model with feature selection for text clustering , 2008, Knowl. Based Syst..

[55]  Grigorii Pivovarov,et al.  Clustering and Classification in Text Collections Using Graph Modularity , 2011, ArXiv.

[56]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  A. Pazgal,et al.  Internet Shopping Agents: Virtual Co-Location and Competition , 2001 .

[58]  Chaochang Chiu,et al.  A Proposed Data Mining Approach for Internet Auction Fraud Detection , 2007, PAISI.

[59]  Christopher Tucci,et al.  Fraudulent auctions on the Internet , 2006, Electron. Commer. Res..

[60]  Qingsheng Zhu,et al.  Subtractive Clustering Based RBF Neural Network Model for Outlier Detection , 2009, J. Comput..

[61]  Steven K. Donoho,et al.  Early detection of insider trading in option markets , 2004, KDD.

[62]  Judy E. Scott,et al.  The Role of Reputation Systems in Reducing On-Line Auction Fraud , 2006, Int. J. Electron. Commer..

[63]  Dibyen Majumdar,et al.  Price comparison: A reliable approach to identifying shill bidding in online auctions? , 2012, Electron. Commer. Res. Appl..

[64]  Wen-Hsi Chang,et al.  An effective early fraud detection method for online auctions , 2012, Electron. Commer. Res. Appl..

[65]  Hao Wang,et al.  An auctioning reputation system based on anomaly , 2005, CCS.