Combinatorial Fusion Analysis: Methods and Practices of Combining Multiple Scoring Systems

Combination methods have been investigated as a possible means to improve performance in multi-variable (multi-criterion or multi-objective) classification, prediction, learning, and optimization problems. In addition, information collected from multi-sensor or multi-source environment also often needs to be combined to produce more accurate information, to derive better estimation, or to make more knowledgeable decisions. In this chapter, we present a method, called Combinatorial Fusion Analysis (CFA), for analyzing combination and fusion of multiple scoring. CFA characterizes each Scoring system as having included a Score function, a Rank function, and a Rank/score function. Both rank combination and score combination are explored as to their combinatorial complexity and computational efficiency. IDEA GROUP PUBLISHING This paper appears in the publication, Advanced Data Mining Technologies in Bioinformatics edited by Hui-Huang Hsu Reichgelt © 2006, Idea Group Inc. 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com ITB12938 Combinatorial Fusion Analysis 33 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Information derived from the scoring characteristics of each scoring system is used to perform system selection and to decide method combination. In particular, the rank/ score graph defined by Hsu, Shapiro and Taksa (Hsu et al., 2002; Hsu & Taksa, 2005) is used to measure the diversity between scoring systems. We illustrate various applications of the framework using examples in information retrieval and biomedical informatics.

[1]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[2]  Ofer Melnik,et al.  Mixed group ranks: preference and confidence in classifier combination , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Karen E Vigneau-Callahan,et al.  Characterization of diet-dependent metabolic serotypes: primary validation of male and female serotypes in independent cohorts of rats. , 2002, The Journal of nutrition.

[4]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[5]  Stuart M. Brown,et al.  Selection and validation of differentially expressed genes in head and neck cancer , 2004, Cellular and Molecular Life Sciences CMLS.

[6]  Marie-Claude Heydemann,et al.  Cayley graphs and interconnection networks , 1997 .

[7]  Sargur N. Srihari,et al.  Combination of Decisions by Multiple Classifiers , 1992 .

[8]  Chuan Yi Tang,et al.  Feature selection and combination criteria for improving predictive accuracy in protein structure classification , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[9]  Karen E Vigneau-Callahan,et al.  Characterization of diet-dependent metabolic serotypes: proof of principle in female and male rats. , 2002, The Journal of nutrition.

[10]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[11]  Cheng-Yan Kao,et al.  Combination methods in microarray analysis , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..

[12]  H. Young,et al.  A Consistent Extension of Condorcet’s Election Principle , 1978 .

[13]  Honglian Shi,et al.  Development of biomarkers based on diet-dependent metabolic serotypes: practical issues in development of expert system-based classification models in metabolomic studies. , 2004, Omics : a journal of integrative biology.

[14]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[15]  Belur V. Dasarathy,et al.  Elucidative fusion systems - an exposition , 2000, Inf. Fusion.

[16]  H. Young Social Choice Scoring Functions , 1975 .

[17]  G. L. Thompson Graphical Techniques for Ranked Data , 1993 .

[18]  Peter McCullagh Models on Spheres and Models for Permutations , 1993 .

[19]  Paul B. Kantor,et al.  Counter-intuitive Cases of Data Fusion in Information Retrieval. , 2001 .

[20]  César Hervás-Martínez,et al.  Cooperative coevolution of artificial neural network ensembles for pattern classification , 2005, IEEE Transactions on Evolutionary Computation.

[21]  Jürgen Branke,et al.  Evolutionary optimization in uncertain environments-a survey , 2005, IEEE Transactions on Evolutionary Computation.

[22]  G. Winkler,et al.  A combination of statistical and syntactical pattern recognition applied to classification of unconstrained handwritten numerals , 1980, Pattern Recognit..

[23]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[24]  Ugo Paolucci,et al.  Development of biomarkers based on diet-dependent metabolic serotypes: characteristics of component-based models of metabolic serotypes. , 2004, Omics : a journal of integrative biology.

[25]  Chuan Yi Tang,et al.  Improving prediction accuracy for protein structure classification by neural network using feature combination , 2005 .

[26]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[27]  Takeo Kanade,et al.  Algorithms for cooperative multisensor surveillance , 2001, Proc. IEEE.

[28]  Javed A. Aslam,et al.  A unified model for metasearch, pooling, and system evaluation , 2003, CIKM '03.

[29]  Hui-Huang Hsu,et al.  Advanced Data Mining Technologies in Bioinformatics , 2006 .

[30]  Josef Kittler,et al.  Sum Versus Vote Fusion in Multiple Classifier Systems , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Tin Kam Ho,et al.  MULTIPLE CLASSIFIER COMBINATION: LESSONS AND NEXT STEPS , 2002 .

[32]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Jochen Triesch,et al.  Democratic Integration: Self-Organized Integration of Adaptive Cues , 2001, Neural Computation.

[34]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[35]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[36]  Hongfang Liu,et al.  Identifying significant genes from microarray data , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[37]  Kwong Bor Ng,et al.  An investigation of the conditions for effective data fusion in information retrieval , 1998 .

[38]  Arthur T. White,et al.  Permutation Groups and Combinatorial Structures , 1979 .

[39]  P. J. Fleming,et al.  The good of the many outweighs the good of the one: evolutionary multi-objective optimization , 2003 .

[40]  Kazuhiko Yamamoto,et al.  Structured Document Image Analysis , 1992, Springer Berlin Heidelberg.

[41]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[42]  P. Willett,et al.  Combination of molecular similarity measures using data fusion , 2000 .

[43]  Chuan Yi Tang,et al.  Methods of Improving Protein Structure Prediction Based on HLA Neural Network and Combinatorial Fusion Analysis , 2005 .

[44]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[45]  M. Kendall Rank Correlation Methods , 1949 .

[46]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[47]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[48]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[49]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[50]  G. P. Patil,et al.  Multiple indicators, partially ordered sets, and linear extensions: Multi-criterion ranking and prioritization , 2004, Environmental and Ecological Statistics.

[51]  Paul B. Kantor,et al.  Predicting the effectiveness of Naïve data fusion on the basis of system characteristics , 2000 .

[52]  Chuen-Der Huang,et al.  Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification , 2003, IEEE Transactions on NanoBioscience.

[53]  D. Frank Hsu,et al.  A Study of Data Fusion in Cayley Graphs G(S{n}, P{n}). , 2004 .

[54]  Marco Gori,et al.  A unified probabilistic framework for Web page scoring systems , 2004, IEEE Transactions on Knowledge and Data Engineering.

[55]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[56]  Ludmila I. Kuncheva Diversity in multiple classifier systems , 2005, Inf. Fusion.

[57]  Miltos D. Grammatikakis,et al.  Parallel System Interconnections and Communications , 2000 .

[58]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[59]  K. Arrow,et al.  Social Choice and Individual Values , 1951 .