Frequent Variable Sets Based Clustering for Artificial Neural Networks Particle Classification

Particle classification is one of the major analyses in high-energy particle physics experiments. We design a classification framework combining classification and clustering for particle physics experiments data. The system involves classification by a set of Artificial Neural Networks (ANN); each using distinct subsets of samples selected from the general set. We use frequent variable sets based clustering for partitioning the train samples into several natural subsets, then standard back-propagation ANNs are trained on them. The final decision for each test case is a two-step process. First, the nearest cluster is found for the case, and then the decision is based on the ANN classifier trained on the specific cluster. Comparisons with other classification and clustering methods show that our method is promising.

[1]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[3]  Peter A. Flach,et al.  Naive Bayesian Classification of Structured Data , 2004, Machine Learning.

[4]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[5]  Piotr Indyk,et al.  Nearest Neighbors in High-Dimensional Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[6]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[7]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[8]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[11]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[12]  Xin Jin,et al.  Kernel Independent Component Analysis for Gene Expression Data Clustering , 2006, ICA.

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[14]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[15]  Jürgen Schmidhuber,et al.  Feature Extraction Through LOCOCODE , 1999, Neural Computation.

[16]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .

[17]  Qiang Yang,et al.  Test-cost sensitive naive Bayes classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[22]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[23]  Allan Kardec Barros,et al.  Independent Component Analysis and Blind Source Separation , 2007, Signal Processing.

[24]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[25]  Karl-Michael Schneider,et al.  A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.