Utilizing hierarchical feature domain values for prediction

We propose a Bayesian learning framework which can exploit hierarchical structures of discrete feature domain values to improve the prediction performance on sparse training data. One characteristic of our framework is that it provides a principled way based on mean-variance analysis to transform an original feature domain value to a coarser granularity by exploiting the underlying hierarchical structure. Through this transformation, a tradeoff between precision and robustness is achieved to improve the parameter estimation for prediction. We have conducted comparative experiments using three real-world data sets. The results demonstrate that utilizing domain value hierarchies gains benefits for prediction.

[1]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[2]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[3]  Marie desJardins,et al.  Using Feature Hierarchies in Bayesian Network Learning , 2000, SARA.

[4]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[5]  Adam Kowalczyk,et al.  One class SVM for yeast regulation prediction , 2002, SKDD.

[6]  Vasant Honavar,et al.  Learning decision tree classifiers from attribute value taxonomies and partially specified data , 2003, ICML 2003.

[7]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[8]  Nir Friedman,et al.  Learning the Dimensionality of Hidden Variables , 2001, UAI.

[9]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[10]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[11]  C. Stein Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution , 1956 .

[12]  George Forman Feature engineering for a gene regulation prediction task , 2002, SKDD.

[13]  Ryszard S. Michalski,et al.  An Application of AI Techniques to Structuring Objects into an Optimal Conceptual Hierarchy , 1981, IJCAI.

[14]  Daphne Koller,et al.  Probabilistic hierarchical clustering for biological data , 2002, RECOMB '02.

[15]  J. Neyman,et al.  INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[16]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[17]  Mark Craven The genomics of a signaling pathway: a KDD Cup challenge task , 2002, SKDD.

[18]  Andrew McCallum,et al.  Text Classification by Bootstrapping with Keywords, EM and Shrinkage , 1999 .