Link-Based Text Classification Using Bayesian Networks

In this paper we propose a new methodology for link-based document classification based on probabilistic classifiers and Bayesian networks. We also report the results obtained of its application to the XML Document Mining Track of INEX'09.

[1]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[3]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[6]  Andrew Trotman,et al.  Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers , 2008, INEX.

[7]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[8]  Luis M. de Campos,et al.  Probabilistic Methods for Structured Document Classification at INEX'07 , 2007, INEX.

[9]  Luis M. de Campos,et al.  Probabilistic Methods for Link-Based Classification at INEX 2008 , 2009, INEX.

[10]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[11]  Luis M. de Campos,et al.  Or gate Bayesian networks for text classification: A discriminative alternative approach to multinom , 2008 .

[12]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[13]  Wray L. Buntine A Guide to the Literature on Learning Probabilistic Networks from Data , 1996, IEEE Trans. Knowl. Data Eng..

[14]  Ludovic Denoyer,et al.  Overview of the INEX 2008 XML Mining Track , 2008, INEX.

[15]  Andrew Trotman,et al.  Advances in Focused Retrieval, 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, Dagstuhl Castle, Germany, December 15-18, 2008. Revised and Selected Papers , 2009, INEX.

[16]  Elvira: An Environment for Creating and Using Probabilistic Graphical Models , 2002, Probabilistic Graphical Models.

[17]  Serafín Moral,et al.  Algorithms for Approximate Probability Propagation in Bayesian Networks , 2004 .

[18]  Antonio Salmerón,et al.  Advances in Bayesian Networks (Studies in Fuzziness and Soft Computing, V. 146) , 2004 .

[19]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[20]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[21]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.