A CMAC-based scheme for determining membership with classification of text strings

Abstract Membership determination of text strings has been an important procedure for analyzing textual data of a tremendous amount, especially when time is a crucial factor. Bloom filter has been a well-known approach for dealing with such a problem because of its succinct structure and simple determination procedure. As determination of membership with classification is becoming increasingly desirable, parallel Bloom filters are often implemented for facilitating the additional classification requirement. The parallel Bloom filters, however, tend to produce additional false-positive errors since membership determination must be performed on each of the parallel layers. We propose a scheme based on CMAC, a neural network mapping, which only requires a single-layer calculation to simultaneously obtain information of both the membership and classification. A hash function specifically designed for text strings is also proposed. The proposed scheme could effectively reduce false-positive errors by converging the range of membership acceptance to the minimum for each class during the neural network mapping. Simulation results show that the proposed scheme committed significantly less errors than the benchmark, parallel Bloom filters, with limited and identical memory usage at different classification levels.

[1]  George Varghese,et al.  Beyond bloom filters: from approximate membership checks to approximate state machines , 2006, SIGCOMM.

[2]  Rui Li,et al.  Fast and deterministic hash table lookup using discriminative bloom filters , 2013, J. Netw. Comput. Appl..

[3]  Yossi Matias,et al.  Augmenting Suffix Trees, with Applications , 1998, ESA.

[4]  Nasir D. Memon,et al.  Payload attribution via hierarchical bloom filters , 2004, CCS '04.

[5]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[6]  Fang Hao,et al.  Incremental Bloom Filters , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[7]  Timothy Sherwood,et al.  Bit-split string-matching engines for intrusion detection and prevention , 2006, TACO.

[8]  Cheng-Jian Lin,et al.  A novel hybrid learning algorithm for parametric fuzzy CMAC networks and its classification applications , 2008, Expert Syst. Appl..

[9]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[10]  Rabi N. Mahapatra,et al.  A Power and Throughput-Efficient Packet Classifier with n Bloom Filters , 2011, IEEE Transactions on Computers.

[11]  Dan Feng,et al.  Locality-Sensitive Bloom Filter for Approximate Membership Query , 2012, IEEE Transactions on Computers.

[12]  Björn Andersson,et al.  Classification of DNA sequences using Bloom filters , 2010, Bioinform..

[13]  Hyesook Lim,et al.  Reducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters , 2014 .

[14]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[15]  Shigang Chen,et al.  Fast Bloom Filters and Their Generalization , 2014, IEEE Transactions on Parallel and Distributed Systems.

[16]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[17]  D. Ellison,et al.  On the Convergence of the Multidimensional Albus Perceptron , 1991, Int. J. Robotics Res..

[18]  Bruno Baynat,et al.  Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives , 2006, CoNEXT '06.

[19]  Yu Hua,et al.  Using Parallel Bloom Filters for Multiattribute Representation on Network Services , 2010, IEEE Transactions on Parallel and Distributed Systems.

[20]  Michiel H. M. Smid,et al.  On the false-positive rate of Bloom filters , 2008, Inf. Process. Lett..

[21]  Hyesook Lim,et al.  Hierarchical packet classification using a Bloom filter and rule-priority tries , 2010, Comput. Commun..