论文信息 - Using information gain to build meaningful decision forests for multilabel classification

Using information gain to build meaningful decision forests for multilabel classification

“Gain-Based Separation” is a novel heuristic that modifies the standard multiclass decision tree learning algorithm to produce forests that can describe an example or object with multiple classifications. When the information gain at a node would be higher if all examples of a particular classification were removed, those examples are reserved for another tree. In this way, the algorithm performs some automated separation of classes into categories; classes are mutually exclusive within trees but not across trees. The algorithm was tested on naive subjects' descriptions of objects to a robot, using YUV color space and basic size and distance features. The new method outperforms the common strategy of separating multilabel problems into L binary outcome decision trees, and also outperforms RAkEL [1], a recent method for producing random multilabel forests.

Allison Petrosino | Kevin Gold | K. Gold | A. Petrosino

[1] Zhi-Hua Zhou,et al. Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2] Eve V. Clark,et al. The principle of contrast: A constraint on language acquisition. , 1987 .

[3] Rohit J. Kate,et al. Learning Language Semantics from Ambiguous Supervision , 2007, AAAI.

[4] King-Sun Fu,et al. A method for the design of binary tree classifiers , 1983, Pattern Recognit..

[5] Catherine L. Harris,et al. The human semantic potential: Spatial language and constrained connectionism , 1997 .

[6] Stuart M. Shieber,et al. Prolog and Natural-Language Analysis , 1987 .

[7] Yiannis Kompatsiaris,et al. An Empirical Study of Multi-label Learning Methods for Video Annotation , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[8] Weida Tong,et al. Multiclass Decision Forest--a novel pattern recognition method for multiclass classification in microarray data analysis. , 2004, DNA and cell biology.

[9] Jeffrey Mark Siskind,et al. Lexical Acquisition in the Presence of Noise and Homonymy , 1994, AAAI.

[10] E. Markman,et al. Children's use of mutual exclusivity to constrain the meanings of words , 1988, Cognitive Psychology.

[11] Zhi-Hua Zhou,et al. ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[12] Alex Pentland,et al. Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[13] Dana H. Ballard,et al. A multimodal learning interface for grounding spoken language in sensory perceptions , 2004, ACM Trans. Appl. Percept..

[14] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.

[15] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[16] Brian Scassellati,et al. Robotic vocabulary building using extension inference and implicit contrast , 2009, Artificial Intelligence.

[17] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[18] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[19] Grigorios Tsoumakas,et al. Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[20] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[21] Angus Gellatly,et al. Colourful Whorfian Ideas: Linguistic and Cultural Influences on the Perception and Cognition of Colour, and on the Investigation of Them , 1995 .

[22] Jun Zhang,et al. Learning Hierarchical Classifiers with Class Taxonomies , 2005 .

[23] David L. Dowe,et al. MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes , 2003, Australian Conference on Artificial Intelligence.

[24] Vasant Honavar,et al. Learning Classifiers Using Hierarchically Structured Class Taxonomies , 2005, SARA.

[25] P. Kay,et al. Resolving the question of color naming universals , 2003, Proceedings of the National Academy of Sciences of the United States of America.