Evaluating the Effect of Preprocessing in Arabic Documents Clustering

[1]  Li-Ping Jing,et al.  Improved feature selection approach TFIDF in text mining , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[2]  Campbell B. Read,et al.  Zipf's Law , 2004 .

[3]  R. Duwairi,et al.  Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization , 2007, 2007 Innovations in Information Technologies (IIT).

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  M. Phil,et al.  Survey on Feature Selection in Document Clustering , 2011 .

[6]  Neepa Shah,et al.  Document Clustering: A Detailed Review , 2012 .

[7]  Eric Atwell,et al.  Comparative Evaluation of Arabic Language Morphological Analysers and Stemmers , 2008, COLING.

[8]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[9]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[10]  Ophir Frieder,et al.  On arabic search: improving the retrieval effectiveness via a light stemming approach , 2002, CIKM '02.

[11]  Boleslaw K. Szymanski,et al.  Sequential patterns and temporal patterns for text mining , 2011 .

[12]  Hichem Frigui,et al.  Clustering by competitive agglomeration , 1997, Pattern Recognit..

[13]  Zhi-Hua Zhou,et al.  Distributional Features for Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[15]  Zainab Abu Bakar,et al.  A Rule and Template Based Stemming A lgorithm for Arabic Language , 2011 .

[16]  Wesam M. Ashour,et al.  Stemming Effectiveness in Clustering of Arabic Documents , 2012 .

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[18]  Zhengxin Chen,et al.  Recent trends in Data Mining (DM): Document Clustering of DM Publications , 2006, 2006 International Conference on Service Systems and Service Management.

[19]  Mohamed S. Abdel-Wahab,et al.  An Intelligent System For Arabic Text Categorization , 2006 .

[20]  Yiming Yang,et al.  Noise reduction in a statistical approach to text categorization , 1995, SIGIR '95.

[21]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[22]  Ahmed Abdelali,et al.  Building A Modern Standard Arabic Corpus , 2004 .

[23]  Tarek F. Gharib,et al.  Arabic Text Classification Using Support Vector Machines , 2009, Int. J. Comput. Their Appl..

[24]  Saïd El Alaoui Ouatik,et al.  Arabic text summarization based on latent semantic analysis to enhance arabic documents clustering , 2013, ArXiv.

[25]  Weiguo Fan,et al.  Tapping the power of text mining , 2006, CACM.

[26]  Vivek Kumar Singh,et al.  Document Clustering Using K-Means, Heuristic K-Means and Fuzzy C-Means , 2011, 2011 International Conference on Computational Intelligence and Communication Networks.

[27]  Xiaohui Cui,et al.  Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm , 2005 .

[28]  Omaia M. Al-Omari,et al.  EVALUATING THE EFFECT OF STEMMING IN CLUSTERING OF ARABIC DOCUMENTS , 2011 .

[29]  Motaz Saad,et al.  Arabic text classification using decision trees , 2010 .

[30]  W. Ashour,et al.  Arabic Morphological Tools for Text Mining , 2010 .

[31]  Andrew A. Millward,et al.  A comparison of hierarchical and partitional clustering techniques for multispectral image classification , 2002, IEEE International Geoscience and Remote Sensing Symposium.

[32]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[33]  Chih-Ping Wei,et al.  Combining preference- and content-based approaches for improving document clustering effectiveness , 2006, Inf. Process. Manag..

[34]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[35]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[36]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[37]  Pu Han,et al.  The research on Chinese document clustering based on WEKA , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[38]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[39]  Sameh Ghwanmeh Applying Clustering of Hierarchical K-means-like Algorithm on Arabic Language , 2007 .

[40]  Dalwadi Bijal,et al.  Overview of Stemming Algorithms for Indian and Non-Indian Languages , 2014, ArXiv.

[41]  Nayer M. Wanas,et al.  A Study of Text Preprocessing Tools for Arabic Text Categorization , 2009 .

[42]  Rebhi S. Baraka,et al.  Design and Evaluation of a Parallel Classifier for Large-Scale Arabic Text , 2013 .

[43]  Raihana Ferdous,et al.  An efficient k-means algorithm integrated with Jaccard distance measure for document clustering , 2009, 2009 First Asian Himalayas International Conference on Internet.

[44]  Zhou Yao,et al.  Research on the Construction and Filter Method of Stop-word List in Text Preprocessing , 2011, 2011 Fourth International Conference on Intelligent Computation Technology and Automation.

[45]  A. Moore The case for approximate Distance Transforms , 2005 .

[46]  Mohamed A. Ismail,et al.  Multidimensional data clustering utilizing hybrid search strategies , 1989, Pattern Recognit..

[47]  Ahmed K. Farahat,et al.  Enhancing Document Clustering Using Hybrid Models for Semantic Similarity , 2010 .

[48]  S. A. Ouatik,et al.  Stemming and similarity measures for Arabic Documents Clustering , 2010, 2010 5th International Symposium On I/V Communications and Mobile Network.

[49]  Richa Loohach,et al.  Effect of Distance Functions on K-Means Clustering Algorithm , 2012 .

[50]  Ming-Syan Chen,et al.  Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging , 2005, IEEE Trans. Knowl. Data Eng..

[51]  Mahmud S. Alkoffash Comparing between Arabic Text Clustering using K Means and K Mediods , 2012 .

[52]  Ramayya Krishnan,et al.  Incremental hierarchical clustering of text documents , 2006, CIKM '06.

[53]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[54]  Motaz Saad,et al.  The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification , 2010 .

[55]  Chew Lim Tan,et al.  A comprehensive comparative study on term weighting schemes for text categorization with support vector machines , 2005, WWW '05.

[56]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[57]  V. Tunali,et al.  Examining the impact of stemming on clustering Turkish texts , 2012, 2012 International Symposium on Innovations in Intelligent Systems and Applications.

[58]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[59]  Hassan Abolhassani,et al.  Harmony K-means algorithm for document clustering , 2009, Data Mining and Knowledge Discovery.

[60]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[61]  Hsinchun Chen,et al.  Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms , 1995, J. Am. Soc. Inf. Sci..

[62]  Dina Said,et al.  DIMENSIONALITY REDUCTION TECHNIQUES FOR ENHANCING AUTOMATIC TEXT CATEGORIZATION , 2007 .

[63]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[64]  N. Sandhya,et al.  Analysis of Stemming Algorithm for Text Clustering , 2011 .

[65]  M. Rafi,et al.  A comparison of two suffix tree-based document clustering algorithms , 2010, 2010 International Conference on Information and Emerging Technologies.

[66]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[67]  Alan F. Smeaton,et al.  Term Weighting Approaches for Mining Significant Locations from Personal Location Logs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[68]  D. Manimegalai,et al.  An analysis of document clustering algorithms , 2010, 2010 INTERNATIONAL CONFERENCE ON COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES.

[69]  Sabrina Tiun,et al.  K-MEANS BASED ALGORITHM FOR ISLAMIC DOCUMENT CLUSTERING , 2014 .

[70]  Xiaohua Hu,et al.  Semantic text mining and its application in biomedical domain , 2006 .