Automatic Parallelism for Dataflow Graphs
暂无分享,去创建一个
This paper presents a novel algorithm to automate high-level parallelization from graph-based data structures representing data flow. This automatic optimization yields large performance improvements for multi-core machines running host-based applications. Results of these advances are shown through their incorporation into the audio processing engine Application Rendering Immersive Audio (ARIA) presented at AES 117. Although the ARIA system is the target framework, the contributions presented in this paper are generic and therefore applicable in a variety of software such as Pure Data and Max/MSP, game audio engines, non-linear editors and related systems. Additionally, the parallel execution paths extracted are shown to give effectively optimal cache performance, yielding significant speedup for such host-based applications. 1. BACKGROUND AND MOTIVATION Graph-based data structures have become popular representations within audio processing software, especially in visual dataflow programming systems such as Pure Data [1], Max/MSP [2] and third-party APIs and libraries like Sonic Flow [3]. This paper presents an algorithm for efficient automated extraction of parallelism from dataflow graphs. As the trend toward multiple-core processors replaces the former trend toward ever-greater processor clock rates, computational efficiency increasingly relies on parallel processing. This is especially true for host-based processing software common in application areas ranging from game audio to non-linear editing, and from effects processing to room correction software. Audio software stands to benefit significantly from parallelism provided by algorithms that leverage multiple cores. Yet parallel programming presents many challenges and pitfalls for the programmer. These pitfalls are detrimental to efficiency. Developers with expertise in audio development are rarely the same ones with expertise in concurrent programming, and vice versa. !"# SADEK AUTOMATIC PARALLEL DATAFLOW GRAPHS AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7 Page 2 of 7 This paper presents an algorithm to automate parallelization such that the parallel code need only be written once, allowing concurrency to be abstracted away from the audio programmer. Thus, developers with differing expertise (i.e. systems and concurrent programming vs. DSP and filter design) focus on their areas of specialty without conflating these separate issues. This division of labor is especially useful in software such as game systems, APIs, nonlinear editors, and enduser applications where high-level tasks can be run in parallel (tracks, effects, etc.). These sorts of applications commonly use buffer sizes ranging from as many as 2048 samples per buffer, to as few as 6 samples in low latency applications. With these relatively small buffer sizes, the cost of dividing buffers and reassembling the solution often outweighs the potential speedup of parallelization. Instead, these applications use very large numbers of buffers internally that are processed separately as atomic chunks. This computation model holds for interactive applications such as game audio, digital audio workstations, immersive audio systems, and many computer music applications, to name a few. Therefore our focus is on these types of applications. Our approach remains valid for large parallel problems that can be modeled in a dataflow graph. 1.1. Graph Representation Formally, a graph is a set of nodes that are connected by edges. See Figure 1 for a simple example. Note that the edges in the figure have arrows. These arrows indicate that the graph is directed. That is, there is a direction associated with each connection between nodes. For example, in Figure 1 there is an edge that goes from Node A to Node C, but there is no edge from Node C to Node A. Figure 1 A simple directed graph containing a cycle. A path through the graph is a sequential series of connected nodes. A path that returns to its starting point is called a cycle. The graph in Figure 1 contains a cycle formed by the edges connecting nodes B, C and D. Graphs that contain no cycles are acyclic. Directed graphs with no cycles are called Directed Acyclic Graphs (DAGs). In the context of audio processing, graph nodes represent processes while edges represent signal routing or dataflow. As such, the simple example in Figure 2 implements high-pass filtered noise. This paradigm is known as the dataflow programming model because paths through the graph represent the flow of data through processing steps. Figure 2 A simple dataflow example of high-pass filtered noise.
[1] Chris Kyriakakis,et al. A Novel Multichannel Panning Method for Standard and Arbitrary Loudspeaker Configurations , 2004 .
[2] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .
[3] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[4] Ramy Sadek. A Host-Based Real-Time Multichannel Immersive Sound Playback and Processing System , 2004 .