Word Cluster-based Mobile Application Categorization

In this paper, we propose a mobile application categorization method using word cluster information. Because the mobile application description can be shortly written, the proposed method utilizes the word cluster seeds as well as the words in the mobile application description, as categorization features. For the fragmented categories of the mobile applications, the proposed method generates the word clusters by applying the frequency of word occurrence per category to K-means clustering algorithm. Since the mobile application description can include some paragraphs unrelated to the categorization, such as installation specifications, the proposed method uses some word clusters useful for the categorization. Experiments show that the proposed method improves the recall (5.65%) by using the word cluster information. ▸

[1]  Clément de Groc Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[2]  Lakshman Jayaratne,et al.  Automatic text classification and focused crawling , 2011, 2011 Sixth International Conference on Digital Information Management.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Hae-Chang Rim,et al.  Semantic Role Labeling using Maximum Entropy Model , 2004, CoNLL.

[5]  Seong-Je Cho,et al.  Classification of Malicious Web Pages by Using SVM , 2012 .

[6]  Kwang-Rok Han,et al.  A Study on Paper Retrieval System based on OWL Ontology , 2009 .

[7]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[8]  Won Suk Lee,et al.  A proper folder recommendation technique using frequent itemsets for efficient e-mail classification , 2011 .

[9]  Evangelos E. Milios,et al.  PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING , 2012, Comput. Intell..

[10]  Timothy N. Rubin,et al.  Statistical topic models for multi-label document classification , 2011, Machine Learning.

[11]  Guanling Chen,et al.  AppJoy: personalized mobile application discovery , 2011, MobiSys '11.

[12]  So-Young Park,et al.  Document Classification Model Using Web Documents for Balancing Training Corpus Size per Category , 2013, J. Inform. and Commun. Convergence Engineering.

[13]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[14]  Gang Lu,et al.  A new semantic similarity measuring method based on web search engines , 2010 .