Weight evaluation for features via constrained data-pairscan't-linkq

Facing the massive amount of data appearing on the web, automatic analysis tools have become essential for web users to discover valuable information online. Precise similarity measurement plays a decisive role in enabling analysis tools to acquire high-quality performances. Because different features contribute diversely to similarity calculation, it is necessary to utilize weight to measure feature's contribution and import it into similarity measurement. To accurately assign feature's weight, constrained data-pairs provided by users are usually imported into the weight evaluation procedure, whereas conventional plans all fail to consider two challenges: (a) asymmetrical distribution of constrained data-pairs, and (b) inconsistency contained by constrained data-pairs. If these two issues occur, conventional plans are incompetent at addressing them or are even unable to work. Thus, this paper proposes a novel constraint based weight evaluation to address these two issues. For the former, constrained data-pairs are partitioned into several equivalent classes, and distributing parameters are assigned to constrained data-pairs to balance their distributions. For the latter, constrained data-pairs are connected one after another, and belief values are thereby formed to indicate their probability of being inconsistent. Experimental results demonstrate that this type of evaluation is independent of any algorithm. With this evaluation, similarities can be calculated more accurately.

[1]  Chengjun Liu,et al.  Discriminant analysis and similarity measure , 2014, Pattern Recognit..

[2]  Mohsen Pourahmadi,et al.  Modeling covariance matrices via partial autocorrelations , 2009, J. Multivar. Anal..

[3]  Haytham Elghazel,et al.  A semi-supervised feature ranking method with ensemble learning , 2012, Pattern Recognit. Lett..

[4]  Feiping Nie,et al.  Efficient semi-supervised feature selection with noise insensitive trace ratio criterion , 2013, Neurocomputing.

[5]  Daoqiang Zhang,et al.  Constraint Score: A new filter method for feature selection with pairwise constraints , 2008, Pattern Recognit..

[6]  Chih-Jen Lin,et al.  Large Linear Classification When Data Cannot Fit in Memory , 2011, TKDD.

[7]  Fuji Ren,et al.  Class-indexing-based term weighting for automatic text classification , 2013, Inf. Sci..

[8]  Zoran Nenadic,et al.  Approximate information discriminant analysis: A computationally simple heteroscedastic feature extraction technique , 2008, Pattern Recognit..

[9]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[10]  Wenyin Liu,et al.  A short text modeling method combining semantic and statistical information , 2010, Inf. Sci..

[11]  Yadong Wang,et al.  Improving fuzzy c-means clustering based on feature-weight learning , 2004, Pattern Recognit. Lett..

[12]  R. Mooney,et al.  Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering , 2003 .

[13]  Xizhao Wang,et al.  Maximum Ambiguity-Based Sample Selection in Fuzzy Decision Tree Induction , 2012, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jane You,et al.  Semi-supervised classification based on random subspace dimensionality reduction , 2012, Pattern Recognit..

[15]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[16]  Han-Xiong Li,et al.  Probabilistic support vector machines for classification of noise affected data , 2013, Inf. Sci..

[17]  Seoung Bum Kim,et al.  Unsupervised feature selection using weighted principal components , 2011, Expert Syst. Appl..

[18]  Pavel Kordík,et al.  Document classification with supervised latent feature selection , 2012, WIMS '12.

[19]  Yu-Lin He,et al.  Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes , 2014, IEEE Transactions on Cybernetics.

[20]  Jian Yang,et al.  Joint Laplacian feature weights learning , 2014, Pattern Recognit..

[21]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[22]  Weiguo Fan,et al.  Trace-Oriented Feature Analysis for Large-Scale Text Data Dimension Reduction , 2011, IEEE Transactions on Knowledge and Data Engineering.

[23]  Zeshui Xu,et al.  Distance and similarity measures for hesitant fuzzy linguistic term sets and their application in multi-criteria decision making , 2014, Inf. Sci..

[24]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[25]  Lei Lin,et al.  Probability-based text clustering algorithm by alternately repeating two operations , 2013, J. Inf. Sci..

[26]  Seung-won Hwang,et al.  Efficient entity matching using materialized lists , 2014, Inf. Sci..

[27]  Yi Shen,et al.  Multivariate Gray Model-Based BEMD for Hyperspectral Image Classification , 2013, IEEE Transactions on Instrumentation and Measurement.

[28]  Sankar K. Pal,et al.  Unsupervised feature evaluation: a neuro-fuzzy approach , 2000, IEEE Trans. Neural Networks Learn. Syst..

[29]  Marek Reformat,et al.  Assessment of semantic similarity of concepts defined in ontology , 2013, Inf. Sci..

[30]  Min Wu,et al.  Multi-label ensemble based on variable pairwise constraint projection , 2013, Inf. Sci..

[31]  Samir Elloumi,et al.  Formal context coverage based on isolated labels: An efficient solution for text feature extraction , 2012, Inf. Sci..

[32]  Enrique Herrera-Viedma,et al.  A statistical comparative study of different similarity measures of consensus in group decision making , 2013, Inf. Sci..

[33]  Yunming Ye,et al.  TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[34]  Kevin Duh,et al.  Semi-supervised ranking for document retrieval , 2011, Comput. Speech Lang..

[35]  Maoguo Gong,et al.  Fast density-weighted low-rank approximation spectral clustering , 2011, Data Mining and Knowledge Discovery.

[36]  Jianfeng Xu,et al.  Feature selection for SVM via optimization of kernel polarization with Gaussian ARD kernels , 2010, Expert Syst. Appl..

[37]  Yiu-ming Cheung,et al.  Semi-Supervised Maximum Margin Clustering with Pairwise Constraints , 2012, IEEE Transactions on Knowledge and Data Engineering.

[38]  Amy Loutfi,et al.  A review of unsupervised feature learning and deep learning for time-series modeling , 2014, Pattern Recognit. Lett..

[39]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[40]  Tommy W. S. Chow,et al.  Robust linearly optimized discriminant analysis , 2012, Neurocomputing.

[41]  Ming Liu,et al.  Data Evolvement Analysis Based on Topology Self-Adaptive Clustering algorithm , 2012, Inf. Technol. Control..

[42]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[43]  Xiaohua Hu,et al.  Towards effective document clustering: A constrained K-means based approach , 2008, Inf. Process. Manag..

[44]  Hichem Frigui,et al.  Unsupervised learning of prototypes and attribute weights , 2004, Pattern Recognit..

[45]  Sang-Woon Kim,et al.  On incrementally using a small portion of strong unlabeled data for semi-supervised learning algorithms , 2014, Pattern Recognit. Lett..

[46]  Jian-Huang Lai,et al.  Penalized Preimage Learning in Kernel Principal Component Analysis , 2010, IEEE Transactions on Neural Networks.

[47]  Zhongsheng Hua,et al.  Semi-supervised learning based on nearest neighbor rule and cut edges , 2010, Knowl. Based Syst..

[48]  Nello Cristianini,et al.  Efficient classification of multi-labeled text streams by clashing , 2014, Expert Syst. Appl..

[49]  Qingshan Jiang,et al.  Feature selection via maximizing global information gain for text classification , 2013, Knowl. Based Syst..

[50]  Xiong Fanlun,et al.  A Constrained Partition Model and K-Means Algorithm , 2005 .

[51]  Jun Wang,et al.  Efficient Euclidean distance transform algorithm of binary images in arbitrary dimensions , 2013, Pattern Recognit..

[52]  Francisco Herrera,et al.  Integrating Instance Selection, Instance Weighting, and Feature Weighting for Nearest Neighbor Classifiers by Coevolutionary Algorithms , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[53]  Wang Li,et al.  Fuzzy C Mean Algorithm Based on Feature Weights , 2006 .

[54]  Coskun Bayrak,et al.  Estimation of quality of service in spelling correction using Kullback-Leibler divergence , 2011, Expert Syst. Appl..

[55]  Chih-Ping Wei,et al.  Discovering Event Evolution Graphs From News Corpora , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[56]  Soon Myoung Chung,et al.  Text Clustering with Feature Selection by Using Statistical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[57]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Itsuo Kumazawa,et al.  Linear Constraints on Weight Representation for Generalized Learning of Multilayer Networks , 2001, Neural Computation.