Optimizations in Stream Programming for Multimedia Applications by

Multimedia applications are the most dominant workload in desktop and mobile computing. Such applications regularly process continuous sequences of data and can be naturally represented under the stream programming domain to take take advantage of domain-specific optimizations. Exploiting characteristics specific to multimedia programs can provide further significant impact on performance for this class of programs. This thesis identifies many multimedia applications that maintain induction variable state, which directly inhibits data parallelism for the program. We demonstrates it is essential to recognize and parallelize filters with induction variable state to enable scalable parallelization. We eliminate such state by introducing a new language construct that automatically returns the current iteration number of a target filter. This thesis also exploits the fact that multimedia applications are tolerant in the accuracy of the program output. We apply a memoization technique that exploits this tolerance and the repetitive nature of multimedia data. We provide a runtime system that automatically tunes the memoization capabilities for performance and output quality. These optimizations are implemented in the StreamIt programmming language. The necessity of parallelizing induction variable state and performance improvements and quality control of our memoization technique is demonstrated by a case study of the MPEG benchmark. Thesis Supervisor: Saman Amarasinghe Title: Professor

[1]  Peter Norvig,et al.  Techniques for Automatic Memoization with Applications to Context-Free Parsing , 1991, CL.

[2]  Ruby B. Lee,et al.  Challenges to Combining General-Purpose and Multimedia Processors , 1997, Computer.

[3]  William J. Dally,et al.  Programmable Stream Processors , 2003, Computer.

[4]  Christoforos E. Kozyrakis,et al.  A New Direction for Computer Architecture Research , 1998, Computer.

[5]  Lotfi A. Zadeh,et al.  Fuzzy logic, neural networks, and soft computing , 1993, CACM.

[6]  Henry Hoffmann,et al.  MPEG-2 decoding in a stream programming language , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[7]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[8]  Matthew Henry Drake,et al.  Stream Programming for Image and Video Compression , 2006 .

[9]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[10]  Gerald J. Sussman,et al.  Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[11]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[12]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[13]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[14]  DONALD MICHIE,et al.  “Memo” Functions and Machine Learning , 1968, Nature.

[15]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[16]  Timothy W. Finin,et al.  Using automatic memoization as a software engineering tool in real-world AI systems , 1995, Proceedings the 11th Conference on Artificial Intelligence for Applications.

[17]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[18]  Mark Johnson,et al.  Squibs and Discussions: Memoization in Top-Down Parsing , 1995, CL.

[19]  William Thies,et al.  An empirical characterization of stream programs and its implications for language and compiler design , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Lawrence Rauchwerger,et al.  Automatic Detection of Parallelism: A grand challenge for high performance computing , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[21]  Hong Song,et al.  A Programming Model for an Embedded Media Processing Architecture , 2005, SAMOS.

[22]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[23]  William Thies,et al.  Phased scheduling of stream programs , 2003, LCTES '03.

[24]  Ramarathnam Venkatesan,et al.  New Iterative Geometric Methods for Robust Perceptual Image Hashing , 2001, Digital Rights Management Workshop.

[25]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[26]  Joshua S. Auerbach,et al.  Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.

[27]  J. L. Nolan Stable Distributions. Models for Heavy Tailed Data , 2001 .

[28]  Calton Pu,et al.  Spidle: A DSL Approach to Specifying Streaming Applications , 2003, GPCE.

[29]  H. Bastian Sensation and Perception.—I , 1869, Nature.

[30]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[31]  Martin Rinard,et al.  Credible Compilation with Pointers , 1999 .

[32]  Donald Yeung,et al.  Application-Level Correctness and its Impact on Fault Tolerance , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[33]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[34]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[35]  Pradeep K. Dubey,et al.  How Multimedia Workloads Will Change Processor Design , 1997, Computer.

[36]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[37]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[38]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[39]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[40]  Henry Hoffmann,et al.  A stream compiler for communication-exposed architectures , 2002, ASPLOS X.

[41]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[42]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[43]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[44]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[45]  William Thies,et al.  Cache aware optimization of stream programs , 2005, LCTES '05.