Pinned OS/Services: A Case Study of XML Parsing on Intel SCC

Nowadays, we are heading towards integrating hundreds to thousands of cores on a single chip. However, traditional system software and middleware are not well suited to manage and provide services at such large scale. To improve the scalability and adaptability of operating system and middleware services on future many-core platform, we propose the pinned OS/services. By porting each OS and runtime system (middleware) service to a separate core (special hardware acceleration), we expect to achieve maximal performance gain and energy efficiency in many-core environments. As a case study, we target on XML (Extensible Markup Language), the commonly used data transfer/store standard in the world. We have successfully implemented and evaluated the design of porting XML parsing service onto Intel 48-core Single-Chip Cloud Computer (SCC) platform. The results show that it can provide considerable energy saving. However, we also identified heavy performance penalties introduced from memory side, making the parsing service bloated. Hence, as a further step, we propose the memory-side hardware accelerator for XML parsing. With specified hardware design, we can further enhance the performance gain and energy efficiency, where the performance can be improved by 20 % with 12.27 % energy reduction.

[1]  Wei Zhang,et al.  Benchmarking XML Processors for Applications in Grid Web Services , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[2]  Arnaud Le Hors,et al.  Document Object Model (DOM) Level 2 Core Specification - Version 1.0 , 2000 .

[3]  Matthias Gries,et al.  SCC: A Flexible Architecture for Many-Core Platform Research , 2011, Computing in Science & Engineering.

[4]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[5]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[6]  Madhusudhan Govindaraju,et al.  Investigating the limits of SOAP performance for scientific computing , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[7]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[8]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[9]  Zhimin Gu,et al.  Memory-Side Acceleration for XML Parsing , 2011, NPC.

[10]  Michael R. Head,et al.  Grid scheduling and protocols - Benchmarking XML processors for applications in grid web services , 2006, SC.

[11]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[12]  Saurabh Dighe,et al.  The 48-core SCC Processor: the Programmer's View , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[14]  Michael K. Chen,et al.  A Throughput-Driven Task Creation and Mapping for Network Processors , 2007, HiPEAC.

[15]  Zhimin Gu,et al.  Achieving middleware execution efficiency: hardware-assisted garbage collection operations , 2010, The Journal of Supercomputing.

[16]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[17]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[18]  Steven Swanson,et al.  GreenDroid: A mobile application processor for a future of dark silicon , 2010, 2010 IEEE Hot Chips 22 Symposium (HCS).

[19]  Zhimin Gu,et al.  Hardware-assisted middleware: Acceleration of garbage collection operations , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[20]  Anil Telikepalli Power vs. Performance: The 90 nm Inflection Point , 2006 .

[21]  Anant Agarwal,et al.  The Case for a Factored Operating System (fos) , 2008 .

[22]  Ricardo Morin,et al.  Architectural characterization of an XML-centric commercial server workload , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..

[23]  Max B Aron The single-chip cloud computer , 2010 .

[24]  XML parsing: a threat to database performance , 2003, CIKM '03.