Si2 Low Power Forum

Power and Energy Efficiency in the Age of AI

Cosponsored by IEEE CEDA

Monday, July 11, 2022
11:30 a.m. – 3:30 p.m.
San Francisco Marquis Hotel
AMA Conference Center
Room 215

Introduction and Lunch
Si2 Unified Power Model and IEEE 2416 Update
Nagu Dhanwada, Senior Technical Staff Member, IBM
IEEE P2416 Working Group Chair

INDUSTRY PERSPECTIVES

Where designs are heading, covering AI-infused, high-performance microprocessors and cutting-edge designs in mobile, embedded and IoT, and AI-specific accelerators

Pitfalls of Power Estimation for AI & Vision SoCs and How to Avoid Them
Fergus Casey, R&D Director, ARC Processor IP
Synopsys

With the expansion of deep learning inference applications, performance requirements and resulting power dissipation vary significantly. As the power dissipation must fit within the power budget of the deep learning SoC and battery limitations of the target device, it is increasingly important to architect and design AI inference engines with power considerations in mind. This presentation will guide you through wide-ranging power reduction techniques to estimate and reduce power from the architecture phase through to IP/SoC sign-off.

Machine Learning Hardware: from Milliwatt to Kilowatt
Patrick Groeneveld, Principal Engineer
Cerebras Systems

The extreme compute requirements of machine learning drives an entirely new generation of hardware. The compute-intensive ML training is generally done in data centers using re-purposed GPUs. This provides cost-efficient floating-point compute hardware that interfaces with the well-known TensorFlow ML platform. To run on a GPU, however, the training data and weights needs to be segmented to fit the limited on-chip memory and bandwidth. Cerebras takes a radically different approach with a massive 22 x 22 cm monolithic chip that contains over 850,000 powerful compute cores with 2.4 trillion transistors. This massive wafer scale engine allows the entire ML model, including all weights, to remain stationary in hardware while only the training data is streamed in at very high speed. The flip side is a whopping 20kW power consumption.

Systems Not Silicon – Low Power Confronts Machine Learning
Chris Rowen, Vice President of Engineering
Cisco

Machine learning silicon is an industry passion, but too often fixates on hypothetical efficiency metrics and idealized workloads.  A career’s worth of technical leadership experience suggests two key shifts in perspective:

Think Systems: Consider the end-to-end application workloads not just the inference inner-loop for high-volume applications like real-time media intelligence

Broaden Goals: Look at both execution and implementation efficiencies to make good engineering and product strategy choices.

ACADEMIC PERSPECTIVES

Design and power using AI analysis techniques such as neural networks, and the current challenges with power analysis tools, models, and flows

Breaking the Silos: The Need for a Holistic Design Methodology in Tiny ML Systems
Dr. Boris Murmann, Professor of Electrical Engineering
Stanford University

As machine learning algorithms are marching closer to the physical sensors, the design of analog and digital subsystems is becoming more intertwined. In this context, this talk will articulate the need for a holistic design flow that considers cost functions across the system stack.

Formalizing Design-space Exploration for Flexible AI Accelerators 
Dr. Tushar Krishna, Assistant Professor
School of Electrical and Computer Engineering
Georgia Tech University

The proliferation of AI across a variety of domains has led to the rise of domain-specific HW accelerators. This talk will discuss techniques to modelthe design-space of these AI accelerators–formally breaking the design-space into hardware resource assignment, dataflow, and tiling. We will also introduce techniques for sample-efficient design-space exploration for flexible AI accelerators.

Low-Power Design of Neural Network Accelerators
Dr. Massoud Pedram
Professor of Electrical and Computer Engineering
University of Southern California

Low-power design of custom neural network inference accelerators for battery-powered, wireless mobile and IoT devices is a key requirement. Dr. Pedram will describe an approach for minimizing power consumption of such accelerators by replacing costly fixed-point multiply-and-accumulate operations with simple Boolean or multi-valued logic operations. Results on various types of vision networks will be presented to show the efficacy of this approach.

Panel Discussion
The current and future state of power modeling standards, including IEEE Standard, 2416-2019.

  • Jerry Frenkil, IEEE P2416 Working Group Vice Chair
  • Daniel Cross, Senior Principal Solutions Engineer, Cadence Design Systems, P1801 Working Group Member
  • Kaladhar Radhakrishnan, Intel Fellow