Toward One Class Classifier techniques applied to verifier information

One Class Classifier techniques have the ability to identify data that is unknown with respect to a group of known observations. However, the training set only contains instances of the known class and no instances at all or very few instances of unknown data. During training, in the verify problem of a wrapper, we only have instances of the classes we know. Therefore, the One Class Classifier techniques could be applied. In order to evaluate the performance of these methods we use different databases proposed in the current literature. Statistical analyses of the results obtained by some basic One Class Classification techniques like parzen_dd, gauss_dd, svmdd, som_dd and knndd will be described.

[1]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[2]  Nicholas Kushmerick,et al.  Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..

[3]  Jayant Madhavan,et al.  Web-Scale Data Integration: You can afford to Pay as You Go , 2007, CIDR.

[4]  Ian H. Witten,et al.  One-Class Classification by Combining Density and Class Probability Estimation , 2008, ECML/PKDD.

[5]  Roberto Marcondes Cesar Junior,et al.  Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification , 2005, IEEE Transactions on Medical Imaging.

[6]  Mary Shaw,et al.  Semantic anomaly detection in online data sources , 2002, ICSE '02.

[7]  Carlos R. Rivero,et al.  Integrating Deep-Web Information Sources , 2010, PAAMS.

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[9]  AnHai Doan,et al.  Mapping Maintenance for Data Integration Systems , 2005, VLDB.

[10]  J. L. Hodges,et al.  Rank Methods for Combination of Independent Experiments in Analysis of Variance , 1962 .

[11]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[12]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[13]  Boris Chidlovskii,et al.  Documentum ECI self-repairing wrappers: performance analysis , 2006, SIGMOD Conference.

[14]  Padraig Cunningham,et al.  An evaluation of dimension reduction techniques for one-class classification , 2007, Artificial Intelligence Review.

[15]  Craig A. Knoblock,et al.  Wrapper Maintenance: A Machine Learning Approach , 2011, J. Artif. Intell. Res..

[16]  Francisco Herrera,et al.  Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification , 2011, Pattern Recognit..

[17]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[18]  Kevin Chen-Chuan Chang,et al.  Understanding Web query interfaces: best-effort parsing with hidden syntax , 2004, SIGMOD '04.

[19]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[20]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[21]  Frederick H. Lochovsky,et al.  Data extraction and label assignment for web databases , 2003, WWW '03.

[22]  Jayant Madhavan,et al.  Web-Scale Data Integration: You can afford to Pay as You Go , 2007, CIDR.