www.design-reuse-embedded.com
Find Top SoC Solutions
for AI, Automotive, IoT, Security, Audio & Video...

Untether Unveils 2-PFLOPS AI Chip, Edge Roadmap

www.eetimes.com, Aug. 23, 2022 – 

At Hot Chips this week, Untether unveiled its second-gen architecture for AI inference, the first chip using this architecture, as well as plans to expand to edge and endpoint accelerators.

Untether's new architecture, internally codenamed Boqueria, addresses trends for very large neural networks, including transformer networks in natural language processing and beyond, endpoint applications that require power efficiency, and applications that require performance and power efficiency combined with prediction accuracy.

The first chip to use the Boqueria architecture, SpeedAI, is a data center inference accelerator capable of 2 PFLOPS of FP8 performance running at peak power consumption (66 W), or 30 TFLOPS/W based on a more usual 30-35 W power envelope. (Untether's first generation chip, RunAI, could handle 500 TOPS of INT8.)

This level of performance translates to running BERT-base inference at 750 queries per second per Watt, which the company says is 15× the performance of a state-of-the-art GPU.

The 35 by 35-mm chip is built on TSMC's 7 nm technology and uses more than 1,400 optimized RISC-V cores–the most EE Times has seen in a commercial chip (beating the previous record holder, Esperanto).

"[The performance] is a convergence of different factors," Bob Beachler, VP of product at Untether, told EE Times. "It's a combination of a lot of things, including circuit design, data types, understanding how neural networks operate–how does a transformer operate compared to a convolutional network?–all of these things we've been able to embody in our second-generation chip."

Untether carefully considered the balance between flexibility, performance, and scalability when working on Boqueria.

"To make general-purpose AI compute architecture, you have to have the right level of granularity and flexibility to efficiently be able to run this plethora of neural networks and be able to scale from small to large," Beachler said. Accuracy is also important for inference workloads, he added, particularly for recommendation where any percentage points of accuracy loss can mean substantial financial losses, and for safety-oriented applications like autonomous driving.

click here to read more...

 Back

Partner with us

List your Products

Suppliers, list and add your products for free.

More about D&R Privacy Policy

© 2024 Design And Reuse

All Rights Reserved.

No portion of this site may be copied, retransmitted, reposted, duplicated or otherwise used without the express written permission of Design And Reuse.