A CONDITIONAL FEATURE UTILIZATION APPROACH TO ITEMSET RETRIEVAL IN ONLINE SHOPPING SERVICES

Due to the increasing number of items with a variety of descriptions for a product type, itemset retrieval is considered as an essential function for enhancing shopping experiences of customers in online malls. This paper considers an itemset retrieval problem to construct an itemset consisting of items belonging to the same product type against a query item in which a customer is interested. In contrast to the previous approaches that require additional prior information such as itemset memberships and the known number of itemsets, we propose a semi-supervised itemset retrieval model that can automatically find a target itemset for a query item based on two item features, namely textual description and price. Specifically, in order to precisely identify itemsets, the proposed model conditionally utilizes price feature of an item only when its textual description feature is relevant to that of a query item. Experiment results based on two real-world datasets show that the proposed model outperformed the other alternatives.

[1]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Antonio Ferrández Rodríguez Lexical and Syntactic knowledge for Information Retrieval , 2011, Inf. Process. Manag..

[3]  Ariel Fuxman,et al.  Matching unstructured product offers to structured product specifications , 2011, KDD.

[4]  Jae-Yoon Jung,et al.  Revenue maximizing itemset construction for online shopping services , 2013, Ind. Manag. Data Syst..

[5]  Ram D. Gopal,et al.  Shopbot 2.0: Integrating Recommendations and Promotions with Comparison Shopping , 2006, Decis. Support Syst..

[6]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[7]  Junjie Wu,et al.  HySAD: a semi-supervised hybrid shilling attack detector for trustworthy product recommendation , 2012, KDD.

[8]  Izak Benbasat,et al.  Assessing Screening and Evaluation Decision Support Systems: A Resource-Matching Approach , 2010, Inf. Syst. Res..

[9]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[10]  Fang-Fang Tang,et al.  Forthcoming , 2001, Central European History.

[11]  Maria L. Gini,et al.  A predictive empirical model for pricing and resource allocation decisions , 2007, ICEC.

[12]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[13]  Fredric C. Gey,et al.  The relationship between recall and precision , 1994 .

[14]  Xin Li,et al.  Optimizing user exploring experience in emerging e-commerce products , 2012, WWW.

[15]  Yiu-ming Cheung,et al.  A new feature selection method for Gaussian mixture clustering , 2009, Pattern Recognit..

[16]  Sang-goo Lee,et al.  Modified Naïve Bayes Classifier for E-Catalog Classification , 2006, DEECS.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[19]  Lingfei Wu,et al.  Online shopping among Chinese consumers: an exploratory investigation of demographics and value orientation , 2011 .

[20]  John G. Lynch,et al.  Wine Online: Search Costs Affect Competition on Price, Quality, and Distribution , 2000 .

[21]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[22]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[23]  Nizar Grira,et al.  Unsupervised and Semi-supervised Clustering : a Brief Survey ∗ , 2004 .

[24]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[25]  Claudio Carpineto,et al.  Comparing Weighting Models for Monolingual Information Retrieval , 2003, CLEF.

[26]  Wai Lam,et al.  An Unsupervised Approach for Product Record Normalization across Different Web Sites , 2008, AAAI.

[27]  Partha Pratim Talukdar,et al.  Improving Product Classification Using Images , 2011, 2011 IEEE 11th International Conference on Data Mining.

[28]  M. Saravana Kumar,et al.  Online Shopping In The UK , 2011 .

[29]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[30]  Michel C. A. Klein,et al.  GoldenBullet: Automated Classification of Product Data in E-commerce , 2002 .

[31]  Zhangxi Lin,et al.  Capturing the essence of word-of-mouth for social commerce: Assessing the quality of online e-commerce reviews by a semi-supervised approach , 2013, Decis. Support Syst..

[32]  Cenk Kocas A Model of Internet Pricing Under Price-Comparison Shopping , 2005, Int. J. Electron. Commer..

[33]  John Shawe-Taylor,et al.  Semi-supervised feature learning from clinical text , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[34]  Iadh Ounis,et al.  Incorporating term dependency in the dfr framework , 2007, SIGIR.

[35]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[36]  Xu-Ying Liu,et al.  Crest: Cluster-based Representation Enrichment for Short Text Classification , 2013, PAKDD.

[37]  Rong Jin,et al.  Semi-supervised SVM batch mode active learning for image retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Sang-goo Lee,et al.  Exploiting Attribute-Wise Distribution of Keywords and Category Dependent Attributes for E-Catalog Classification , 2008, ICIC.

[39]  Ian D. Watson,et al.  Ontology-Aided Product Classification: A Nearest Neighbour Approach , 2011, ICCBR.

[40]  H.P. Ng,et al.  Medical Image Segmentation Using K-Means Clustering and Improved Watershed Algorithm , 2006, 2006 IEEE Southwest Symposium on Image Analysis and Interpretation.

[41]  Sugato Basu,et al.  Adaptive product normalization: using online learning for record linkage in comparison shopping , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[42]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[43]  Andreas Thor,et al.  Learning-Based Approaches for Matching Web Data Entities , 2010, IEEE Internet Computing.

[44]  Sven Abels,et al.  Empirical Study on Usage of Electronic Product Classification Systems in E-Commerce Organizations in Germany , 2006, J. Electron. Commer. Organ..

[45]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[46]  Doo-Hwan Bae,et al.  An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method , 2007, ESEM 2007.

[47]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[48]  Ophir Frieder,et al.  Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[49]  Hassan Abolhassani,et al.  Harmony K-means algorithm for document clustering , 2009, Data Mining and Knowledge Discovery.

[50]  Erhard Rahm,et al.  Data Partitioning for Parallel Entity Matching , 2010, ArXiv.

[51]  Maurizio Vincini,et al.  A Data Integration Framework for e-Commerce Product Classification , 2002, International Semantic Web Conference.

[52]  Jonghun Park,et al.  Pricing fraud detection in online shopping malls using a finite mixture model , 2013, Electron. Commer. Res. Appl..

[53]  Rakesh Agrawal,et al.  Ameliorating buyer's remorse , 2011, KDD.