Hardware acceleration for similarity measurement in natural language processing

The continuation of Moore's law scaling, but in the absence of Dennard scaling, motivates an emphasis on energy-efficient accelerator-based designs for future applications. In natural language processing, the conventional approach to automatically analyze vast text collections - using scale-out processing - incurs high energy and hardware costs since the central compute-intensive step of similarity measurement often entails pairwise, all-to-all comparisons. We propose a custom hardware accelerator for similarity measures that leverages data streaming, memory latency hiding, and parallel computation across variable-length threads. We evaluate our design through a combination of architectural simulation and RTL synthesis. When executing the dominant kernel in a semantic indexing application for documents, we demonstrate throughput gains of up to 42× and 58× lower energy per similarity-computation compared to an optimized software implementation, while requiring less than 1.3% of the area of a conventional core.

[1]  J.W. Lockwood,et al.  Hardware-Accelerated Parser for Extraction of Metadata in Semantic Network Content , 2007, 2007 IEEE Aerospace Conference.

[2]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Hector Garcia-Molina,et al.  Finding replicated Web collections , 2000, SIGMOD '00.

[4]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[5]  Wolfgang Lehner,et al.  Fast Sorted-Set Intersection using SIMD Instructions , 2011, ADMS@VLDB.

[6]  Shao-Yi Chien,et al.  Flexible Hardware Architecture of Hierarchical K-Means Clustering for Large Cluster Number , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[8]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[9]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[10]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[11]  Jeffrey Harr,et al.  Building Blocks , 2013 .

[12]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[13]  Gang Wang,et al.  A Batched GPU Algorithm for Set Intersection , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[14]  Kin Fun Li,et al.  Hardware acceleration for similarity computations of feature vectors , 2008, Canadian Journal of Electrical and Computer Engineering.

[15]  Masaru Kitsuregawa,et al.  GREO: a commercial database processor based on a pipelined hardware sorter , 1993, SIGMOD '93.

[16]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[17]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[18]  Bolin Ding,et al.  Fast Set Intersection in Memory , 2011, Proc. VLDB Endow..

[19]  Timothy Sherwood,et al.  A high throughput string matching architecture for intrusion detection and prevention , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[20]  Mehran Sahami,et al.  Evaluating similarity measures: a large-scale study in the orkut social network , 2005, KDD '05.

[21]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[22]  Kin Fun Li,et al.  On-Chip Hardware Support for Similarity Measures , 2007, 2007 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[23]  Marios C. Papaefthymiou,et al.  Computational sprinting , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[24]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[25]  Gustavo Alonso,et al.  Efficient frequent item counting in multi-core hardware , 2012, KDD.