论文信息 - Classifying information sender of web documents

Classifying information sender of web documents

Purpose – To develop a method for classifying information sender of web documents, which constitutes an important part of information credibility analysis.Design/methodology/approach – Machine learning approach was employed. About 2,000 human‐annotated web documents were prepared for training and evaluation. The classification model was based on support vector machine, and the features used for the classification included the title and URL of documents, as well as information of the top page.Findings – With relatively small set of features, the proposed method achieved over 50 per cent accuracy.Research limitations/implications – Some of the information sender categories were found to be more difficult to classify. This is due to the subjective nature of the categories, and further refinement of the categories is needed.Practical implications – When combined with opinion/sentiment analysis techniques, information sender classification allows more profound analysis based on interactions between opinions an...

[1] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[3] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4] Mark Liberman,et al. Computational approaches to analyzing weblogs : papers from the AAAI Spring Symposium , 2006 .

[5] Luca de Alfaro,et al. A content-driven reputation system for the wikipedia , 2007, WWW '07.

[6] Thorsten Joachims,et al. Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[7] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[8] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[9] Kaoru Sumi,et al. Evaluation data and prototype system WISDOM for information credibility analysis , 2008, Internet Res..

[10] Tom Cross,et al. Puppy smoothies: Improving the reliability of open, collaborative wikis , 2006, First Monday.