Si2 Low Power Forum
Power and Energy Efficiency in the Age of AI
Cosponsored by IEEE CEDA
Monday, July 11, 2022
11:30 a.m. – 3:30 p.m.
San Francisco Marquis Hotel
AMA Conference Center
Room 215
Introduction and Lunch
Si2 Unified Power Model and IEEE 2416 Update
Nagu Dhanwada, Senior Technical Staff Member, IBM
IEEE P2416 Working Group Chair
INDUSTRY PERSPECTIVES
Where designs are heading, covering AI-infused, high-performance microprocessors and cutting-edge designs in mobile, embedded and IoT, and AI-specific accelerators
Pitfalls of Power Estimation for AI & Vision SoCs and How to Avoid Them
Fergus Casey, R&D Director, ARC Processor IP
Synopsys
With the expansion of deep learning inference applications, performance requirements and resulting power dissipation vary significantly. As the power dissipation must fit within the power budget of the deep learning SoC and battery limitations of the target device, it is increasingly important to architect and design AI inference engines with power considerations in mind. This presentation will guide you through wide-ranging power reduction techniques to estimate and reduce power from the architecture phase through to IP/SoC sign-off.
Machine Learning Hardware: from Milliwatt to Kilowatt
Patrick Groeneveld, Principal Engineer
Cerebras Systems
The extreme compute requirements of machine learning drives an entirely new generation of hardware. The compute-intensive ML training is generally done in data centers using re-purposed GPUs. This provides cost-efficient floating-point compute hardware that interfaces with the well-known TensorFlow ML platform. To run on a GPU, however, the training data and weights needs to be segmented to fit the limited on-chip memory and bandwidth. Cerebras takes a radically different approach with a massive 22 x 22 cm monolithic chip that contains over 850,000 powerful compute cores with 2.4 trillion transistors. This massive wafer scale engine allows the entire ML model, including all weights, to remain stationary in hardware while only the training data is streamed in at very high speed. The flip side is a whopping 20kW power consumption.
Systems Not Silicon – Low Power Confronts Machine Learning
Chris Rowen, Vice President of Engineering
Cisco
Machine learning silicon is an industry passion, but too often fixates on hypothetical efficiency metrics and idealized workloads. A career’s worth of technical leadership experience suggests two key shifts in perspective:
Think Systems: Consider the end-to-end application workloads not just the inference inner-loop for high-volume applications like real-time media intelligence
Broaden Goals: Look at both execution and implementation efficiencies to make good engineering and product strategy choices.
ACADEMIC PERSPECTIVES
Design and power using AI analysis techniques such as neural networks, and the current challenges with power analysis tools, models, and flows
Breaking the Silos: The Need for a Holistic Design Methodology in Tiny ML Systems
Dr. Boris Murmann, Professor of Electrical Engineering
Stanford University
As machine learning algorithms are marching closer to the physical sensors, the design of analog and digital subsystems is becoming more intertwined. In this context, this talk will articulate the need for a holistic design flow that considers cost functions across the system stack.
Formalizing Design-space Exploration for Flexible AI Accelerators
Dr. Tushar Krishna, Assistant Professor
School of Electrical and Computer Engineering
Georgia Tech University
The proliferation of AI across a variety of domains has led to the rise of domain-specific HW accelerators. This talk will discuss techniques to modelthe design-space of these AI accelerators–formally breaking the design-space into hardware resource assignment, dataflow, and tiling. We will also introduce techniques for sample-efficient design-space exploration for flexible AI accelerators.
Low-Power Design of Neural Network Accelerators
Dr. Massoud Pedram
Professor of Electrical and Computer Engineering
University of Southern California
Low-power design of custom neural network inference accelerators for battery-powered, wireless mobile and IoT devices is a key requirement. Dr. Pedram will describe an approach for minimizing power consumption of such accelerators by replacing costly fixed-point multiply-and-accumulate operations with simple Boolean or multi-valued logic operations. Results on various types of vision networks will be presented to show the efficacy of this approach.
Panel Discussion
The current and future state of power modeling standards, including IEEE Standard, 2416-2019.
- Jerry Frenkil, IEEE P2416 Working Group Vice Chair
- Daniel Cross, Senior Principal Solutions Engineer, Cadence Design Systems, P1801 Working Group Member
- Kaladhar Radhakrishnan, Intel Fellow