A new conceptual model for dynamic text clustering Using unstructured text as a case

In recent years, clustering has become a critical success factor for data analysis. Most clustering methods are sensitive to outliers, noise, presentation order, configuration architecture, Bellman's curse of dimensionality and complex shapes. They use the cost functions to reflect the general knowledge about internal structures and distributions of target data. There is no provided mechanism to reflect the dynamics of clustering environment on the data set. Hence, in the present study, an alternative numerical scheme (SC) was proposed to enhance the predictive accuracy of clustering. Our approach exploits variables selection techniques and Fuzzy Adaptive Resonance Theory to increase productivity of knowledge extraction.

[1]  Baowen Xu,et al.  A constrained non-negative matrix factorization in information retrieval , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.

[2]  Yoshifumi Nishio,et al.  Fuzzy Adaptive Resonance Theory Combining Overlapped Category in consideration of connections , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[3]  Jie Liu,et al.  Hierarchical Latent Dirichlet Allocation models for realistic action recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  WeiJyh-Jong,et al.  ECG data compression using truncated singular value decomposition , 2001 .

[5]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[6]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[7]  Liu Xiaoguang,et al.  Comparison Probabilistic Latent Semantic Indexing Model In Chinese Information Retrieval , 2009, 2009 International Forum on Information Technology and Applications.

[8]  Sutheera Puntheeranurak,et al.  Hybrid Naive Bayes Classifier Weighting and Singular Value Decomposition Technique for Recommender System , 2011, 2011 IEEE 2nd International Conference on Software Engineering and Service Science.

[9]  Fansheng Kong,et al.  Gaussian mixture density modeling and decomposition with weighted likelihood , 2004, Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788).

[10]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[11]  Truncated singular value decomposition for semantic-based data retrieval , 2013, 2013 Third International Conference on Communications and Information Technology (ICCIT).

[12]  Pascal Matsakis,et al.  Evaluation of stop word lists in text retrieval using Latent Semantic Indexing , 2011, 2011 Sixth International Conference on Digital Information Management.

[13]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[14]  Biju Issac,et al.  Implementing spam detection using Bayesian and Porter Stemmer keyword stripping approaches , 2009, TENCON 2009 - 2009 IEEE Region 10 Conference.

[15]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[16]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[17]  Issam Dagher,et al.  Properties of learning of a Fuzzy ART Variant , 1999, Neural Networks.

[18]  Nai-Kuan Chou,et al.  ECG data compression using truncated singular value decomposition , 2001, IEEE Trans. Inf. Technol. Biomed..

[19]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[20]  Rafael Morales Bueno,et al.  TF-SIDF: Term frequency, sketched inverse document frequency , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[21]  Hua Li,et al.  The study on technologies for feature selection , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[22]  Young-Koo Lee,et al.  Confident wrapper-type semi-supervised feature selection using an ensemble classifier , 2011, 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC).

[23]  Quan Hu,et al.  Comparison Probabilistic Latent Semantic Indexing Model In Chinese Information Retrieval , 2009, IFITA.

[24]  Choukri Djellali,et al.  Enhancing text clustering model based on Truncated Singular Value Decomposition, fuzzy ART and Cross Validation , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[25]  Y.A. Dimitriadis,et al.  Safe-/spl mu/ARTMAP: a new solution for reducing category proliferation in fuzzy ARTMAP , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[26]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[27]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.