Handling Concept Drift and Feature Evolution in Textual Data Stream Using the Artificial Immune System

Data stream mining is an active research area that has attracted the attention of many researchers in the machine learning community. Discovering knowledge from large amounts of continuously generated data from online services and real time applications constitute a challenging task for data analytics where robust and efficient online algorithms are required. This paper presents a novel method for data stream mining. In particular, two main challenges of data stream processing are addressed, namely, concept drift and feature evolution in textual data streams. To address these issues, the proposed method uses the Artificial Immune System metaheuristic. AIS has powerful adapting capabilities which make it robust even in changing environments. Our proposed algorithm AIS-Clus has the ability to adapt its model to handle concept drift and feature evolution for textual data streams. Experimental results have been performed on textual dataset where efficient and promising results are obtained.

[1]  Jerne Nk Towards a network theory of the immune system. , 1974 .

[2]  Francisco Herrera,et al.  A survey on data preprocessing for data stream mining: Current status and future directions , 2017, Neurocomputing.

[3]  Brian Mac Namee,et al.  Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost , 2010, FLAIRS.

[4]  Alan S. Perelson,et al.  The immune system, adaptation, and machine learning , 1986 .

[5]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[6]  Fabio A. González,et al.  TECNO-STREAMS: tracking evolving clusters in noisy data streams with a scalable immune system learning model , 2003, Third IEEE International Conference on Data Mining.

[7]  Ren-Jieh Kuo,et al.  Integration of artificial immune network and K-means for cluster analysis , 2013, Knowledge and Information Systems.

[8]  Shuai Chen,et al.  K-means clustering method based on artificial immune system in scientific research project management in universities , 2017, Int. J. Comput. Sci. Math..

[9]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[10]  Fernando José Von Zuben,et al.  Learning and optimization using the clonal selection principle , 2002, IEEE Trans. Evol. Comput..

[11]  Alan S. Perelson,et al.  Self-nonself discrimination in a computer , 1994, Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy.

[12]  Charu C. Aggarwal,et al.  Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Mykola Pechenizkiy,et al.  An Overview of Concept Drift Applications , 2016 .