论文信息 - Assessing documents' credibility with genetic programming

Assessing documents' credibility with genetic programming

The concept of example credibility evaluates how much a classifier can trust an example when building a classification model. It is given by a credibility function, which is application dependent and estimated according to a series of factors that influence the credibility of the examples. Here we deal with automatic document classification and study the credibility of a document according to three factors: content, authorship and citations. We propose a genetic programming algorithm to estimate the credibility of training examples, and then add this estimation to a credibility-aware classifier. For that, we model the authorship and citation data as a complex network, and select a set of structural metrics that can be used to estimate credibility. These metrics are then merged with other content-related ones, and used as terminals for the GP. The GP was tested in a subset of the ACM-DL, and results showed that the credibility-aware classifier obtained results of micro and macroF1 from 5% to 8% better than the traditional classifiers.

[1] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2] B. J. Fogg,et al. Credibility and computing technology , 1999, CACM.

[3] Kumar Chellapilla,et al. Data mining using genetic programming: the implications of parsimony on generalization error , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[4] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[5] Michael Granitzer,et al. Blog credibility ranking by exploiting verified content , 2009, WICOW.

[6] Saket S. R. Mengle,et al. Using ambiguity measure feature selection algorithm for support vector machine classifier , 2008, SAC '08.

[7] Miriam J. Metzger,et al. Credibility for the 21st Century: Integrating Perspectives on Source, Message, and Media Credibility in the Contemporary Media Environment , 2003 .

[8] Gisele L. Pappa,et al. Estimating the Credibility of Examples in Automatic Document Classification , 2010, J. Inf. Data Manag..

[9] Dr. Alex A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[10] Henriette Cramer,et al. The effects of source credibility ratings in a cultural heritage information aggregator , 2009, WICOW.

[11] Ramanathan V. Guha,et al. Propagation of trust and distrust , 2004, WWW '04.

[12] Fabrizio Sebastiani,et al. Supervised term weighting for automated text categorization , 2003, SAC '03.

[13] L. da F. Costa,et al. Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[14] Gisele L. Pappa,et al. Temporally-aware algorithms for document classification , 2010, SIGIR '10.

[15] P. Kalbfleisch. Credibility for the 21st Century: Integrating Perspectives on Source, Message, and Media Credibility in the Contemporary Media Environment , 2003 .

[16] Milos Hauskrecht,et al. Boosting KNN text classification accuracy by using supervised term weighting schemes , 2009, CIKM.

[17] John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[18] Lalit M. Patnaik,et al. Application of genetic programming for multicategory pattern classification , 2000, IEEE Trans. Evol. Comput..