4.2 A 12nm Autonomous-Driving Processor with 60.4TOPS, 13.8TOPS/W CNN Executed by Task-Separated ASIL D Control

Autonomous driving systems, when deployed to market, will require accurate and high-speed recognition, judgment and operation. Convolutional neural networks (CNNs) require large amounts of computation for pattern recognition. The CNN performance required for level-3 autonomous driving systems is 120TOPS or higher. As shown in Fig. 4.2.1, recent CNN implementations are oriented toward high performance and low power [1] –[4]. In previously reported SoCs for autonomous driving [5], power consumption is more than 70W, requiring a heavy and expensive water-cooling system. To save weight and cost by using an air-cooling system for an in-vehicle electronics control unit (ECU), power consumption less than 25W is indispensable and around half of that can be assigned for the SoC. Consequently, achieving 120TOPS with 10TOPS/W is necessary for an autonomous driving system. At the same time, achieving the ASIL D standard, the highest safety level defined in ISO 26262, is also required for SoCs for autonomous driving. Dual-core lock step (DCLS) is a technique to satisfy ASIL D by comparing the results of parallel execution of the same process in duplicated hardware. However, simple full-time DCLS doubles power consumption and degrades power efficiency. In this paper, we achieve 60.4TOPS CNN performance with 13.8TOPS/W efficiency in an application processor having high-reliability ASIL D targeted safety mechanisms for autonomous driving system. One and two-device configurations achieving performance of 60 and 120TOPS, respectively, for ADAS and autonomous driving offer practical solutions for products.