A hybrid CPU-FPGA system for high throughput (10Gb/s) streaming document classification

Processing large volumes of information in real time requires large amounts of computational power, which consumes a significant amount of energy. With the rise in the amount of data produced, energy-efficient high-performance information processing systems are becoming a necessity. We present a hybrid CPU-FPGA system for high-throughput classification of streams of textual documents (e.g. emails or web pages). The current system parses the document stream using a multicore CPU and performs classification on the parsed stream using Field-Programmable Gate Arrays (FPGAs). As an example, we demonstrate a Naive Bayes classifier on the TREC Aquaint dataset. Our current solution can classify 10Gb/s internet traffic in real time. Our aim is to increase the throughput to 100Gb/s by incorporating the parser into the FPGA design.

[1]  Wim Vanderbauwhede,et al.  FPGA-accelerated Information Retrieval: High-efficiency document filtering , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[2]  Martin Margala,et al.  Throughput Analysis for a High-Performance FPGA-Accelerated Real-Time Search Application , 2012, Int. J. Reconfigurable Comput..

[3]  Dale Schuurmans,et al.  Combining Naive Bayes and n-Gram Language Models for Text Classification , 2003, ECIR.

[4]  Wim Vanderbauwhede,et al.  High-Performance Computing Using FPGAs , 2013 .

[5]  Wim Vanderbauwhede,et al.  Developing energy efficient filtering systems , 2009, SIGIR.

[6]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[7]  Martin Margala,et al.  Evaluating FPGA-acceleration for real-time unstructured search , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[8]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.