A Deep CFS Model for Text Clustering

With the fast development of the Internet technology, the court text information is collected from various fields at an unprecedented speed, such as Weibo and Wechat. This big court text information of high volume poses a vast challenge for the judge making reasonable decisions based on the vast cases. To cluster the reasonable assistant cases from the vast cases, we propose a deep CFS model for the text clustering, which can cluster the court text effectively, in this paper. In the proposed model, a robust deep text feature extractor is designed to improve the cluster accuracy, in which an ensemble of deep learning models are used to learn the deep features of the text. Furthermore, the CFS algorithm is conducted on the extracted deep text features, to discover the non-spherical clusters with the automatic find of the cluster centers. Finally, the proposed deep cluster model is evaluated on two typical datasets and the results show it can perform better than compared models in terms of the cluster accuracy.

[1]  Andrew Skabar,et al.  Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm , 2013, IEEE Transactions on Knowledge and Data Engineering.

[2]  Laurence T. Yang,et al.  A survey on deep learning for big data , 2018, Inf. Fusion.

[3]  Laurence T. Yang,et al.  An Incremental CFS Algorithm for Clustering Large Data in Industrial Internet of Things , 2017, IEEE Transactions on Industrial Informatics.

[4]  Peng Li,et al.  Deep Convolutional Computation Model for Feature Learning on Big Data in Internet of Things , 2018, IEEE Transactions on Industrial Informatics.

[5]  Jing Gao,et al.  ICFS: An Improved Fast Search and Find of Density Peaks Clustering Algorithm , 2016, 2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).

[6]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[9]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[10]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[11]  Jing Li,et al.  Extended fast search clustering algorithm: widely density clusters, no density peaks , 2015, ArXiv.

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[14]  Peng Li,et al.  An Adaptive Dropout Deep Computation Model for Industrial IoT Big Data Learning With Crowdsourcing to Cloud Computing , 2019, IEEE Transactions on Industrial Informatics.

[15]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[16]  Ying Zhang,et al.  A High-Order CFS Algorithm for Clustering Big Data , 2016, Mob. Inf. Syst..

[17]  Christophe Nicolle,et al.  Understandable Big Data: A survey , 2015, Comput. Sci. Rev..

[18]  R Nikhil,et al.  A Survey on Text Mining and Sentiment Analysis for Unstructured Web Data , 2015 .

[19]  Jing Gao,et al.  Composite event coverage in wireless sensor networks with heterogeneous sensors , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[20]  Laurence T. Yang,et al.  Deep Computation Model for Unsupervised Feature Learning on Big Data , 2016, IEEE Transactions on Services Computing.

[21]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[22]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.