New Frontiers For An Artificial Immune System

AIRS, a resource limited artificial immune classifier system, has performed well on various classification tasks, including data clustering. This thesis proposes the use of this system for the complex task of multi-class document classification. Initially the AIRS system is validated using a standard machine learning dataset, which has not been used previously with this classifier. The use of AIRS for the purpose of document classification was then examined. This includes the pre-processing of HTML documents and the extraction, selection and representation of features, for the purpose of feature vector compilation. AIRS was used to classify various Internet documents, using a variety of datasets. Comparisons were made where the amount of documents, amount of classes and amount of features were varied independently. Additionally, AIRS was compared with another text classification package as a benchmarking exercise. On completion of this we are confident that AIRS is a suitable candidate for increasingly more complex tasks such as hierarchical document classification and multiple taxonomic mappings. Acknowledgements In some ways, this is the most difficult section to write. There are so many people who have provided me with their support for the duration of what has been an awesome 6 months, and it is not easy to find the right words to express my gratitude. However, ‘thanks’ go to my family and pseudo-family for their unquestionable support. I would also like to thank Liz, Jean, and Justin for having the patience to answer all my annoying bash and C++ questions and Jamie for all the advice and inspiration during the early stages of this project. Thanks also go to Marco and the rest of the HP Labs-Bristol students, without whom life would be just one giant classification problem. On a more formal note, I would like to thank Dr’s Dave Cliff, Matt Williamson and Jason Noble for giving me this wonderful opportunity in the first place. Special thanks go to Dr Steve Cayzer who has officially been the most fantastic supervisor, through always being there to answer my incessant questions, despite his obsession with semantic blogging. Last but by no means least, I would like to thank Gillan for his relentless love, support, and encouragement, which will forever mean the world to me.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[3]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[4]  J. Ambühl,et al.  Classification of Meteorological Patterns , 1997, ICANN.

[5]  A. B. Watkins,et al.  A resource limited artificial immune classifier , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[6]  Stephanie Forrest,et al.  Architecture for an Artificial Immune System , 2000, Evolutionary Computation.

[7]  Jon Timmis,et al.  Artificial Immune Recognition System (AIRS): Revisions and Refinements , 2002 .

[8]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[9]  Stephanie Forrest,et al.  Coverage and Generalization in an Artificial Immune System , 2002, GECCO.

[10]  Leandro Nunes de Castro,et al.  Artificial Immune Systems: A New Computational Approach , 2002 .

[11]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[12]  C. Janeway Immunobiology: The Immune System in Health and Disease , 1996 .

[13]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[14]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  Dave Cliff,et al.  Biologically-Inspired Computing Approaches To Cognitive Systems: a partial tour of the literature , 2003 .

[17]  John E. Hunt,et al.  Case Memory and Retrieval Based on the Immune System , 1995, ICCBR.

[18]  Rafael A. Calvo,et al.  Fast Dimensionality Reduction and Simple PCA , 1998, Intell. Data Anal..

[19]  P. Matzinger The Danger Model: A Renewed Sense of Self , 2002, Science.

[20]  Steve Cayzer,et al.  An Immune-based Approach to Document Classification , 2003, IIS.

[21]  Kenneth A. De Jong,et al.  The Coevolution of Antibodies for Concept Learning , 1998, PPSN.

[22]  Stephanie Forrest,et al.  Revisiting LISYS: parameters and normal behavior , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[23]  Lois C. Boggess,et al.  Artificial Immune Systems for Classification : Some Issues , 2002 .

[24]  Peter J. Bentley,et al.  An evaluation of negative selection in an artificial immune system for network intrusion detection , 2001 .

[25]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.